Inference Endpoints¶

All require X-API-Key with predict scope.

POST /predict¶

Synchronous single inference. Blocks until result is ready.

Request:

{"model": "echo", "version": "v1", "data": <any>}

version is optional — omit to use routing rules.

Response 200:

{"result": <any>}

Errors: 400 model/input error · 500 execution error · 429 rate limit · 413 payload too large

POST /predict/batch¶

Synchronous batch. All items run through the same model.

Request:

{"model": "echo", "version": "v1", "items": [<item>, ...]}

items must have at least one element.

Response 200:

{"results": [<result>, ...]}

Results are in the same order as items.

POST /predict/async¶

Submit a job and return immediately.

Request: same shape as /predict (data field).

Response 200:

{"job_id": "550e8400-..."}

GET /predict/async/{job_id}¶

Poll job status.

Response 200:

{
  "job_id": "550e8400-...",
  "status": "succeeded",
  "model": "echo",
  "version": "v1",
  "created_at": "2026-04-27T15:00:00+00:00",
  "result": <value or null>,
  "error_message": <string or null>
}

Status	Meaning
`created`	Record created, not yet queued
`pending`	Queued, waiting for worker
`running`	Worker executing
`succeeded`	Done — `result` populated
`failed`	Failed — `error_message` populated
`cancelled`	Cancelled

Errors: 404 job not found

POST /predict/async/batch¶

Submit a batch job asynchronously.

Request: same shape as /predict/batch (items field).

Response 200:

{"job_id": "550e8400-..."}

When succeeded, result from the status endpoint is a list in the same order as items.