Skip to content

Async Inference

Submit long-running inference jobs and poll for results without blocking.


Submit a job

curl -X POST http://localhost:8000/predict/async \
  -H "X-API-Key: dev-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "my_model", "version": "v1", "data": {"features": [1,2,3]}}'
# → {"job_id": "550e8400-e29b-41d4-a716-446655440000"}

Returns immediately with a job_id.


Poll for status

curl -H "X-API-Key: dev-key" \
  http://localhost:8000/predict/async/550e8400-e29b-41d4-a716-446655440000

Response when complete:

{
  "job_id": "550e8400-...",
  "status": "succeeded",
  "result": {"label": "positive", "score": 0.97},
  "created_at": "2026-05-13T12:00:00Z",
  "finished_at": "2026-05-13T12:00:01Z"
}

Job statuses

Status Meaning
pending Queued, not yet picked up
running Executing
succeeded Complete — result is populated
failed Error — error_message is populated

Batch async

curl -X POST http://localhost:8000/predict/async/batch \
  -H "X-API-Key: dev-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "my_model", "version": "v1", "items": [{"features": [1,2,3]}, {"features": [4,5,6]}]}'

When REDIS_URL is set, jobs are enqueued to an arq worker process. This decouples inference from the API server and allows horizontal scaling.

# Start the worker
arq app.infra.queue.worker.WorkerSettings

Without Redis (development)

When REDIS_URL is not set, jobs run as asyncio tasks on the server's event loop. No worker process needed. Not recommended for production.


See Async Jobs concept for architecture details.