First Deployment¶
This guide walks through making your first inference request using the built-in echo model.
1. Start the server¶
2. Check available models¶
curl -H "X-API-Key: dev-key" http://localhost:8000/models
# -> {"models": [{"name": "echo", "version": "v1"}, {"name": "echo", "version": "v2"}]}
3. Run synchronous inference¶
curl -X POST http://localhost:8000/predict \
-H "X-API-Key: dev-key" \
-H "Content-Type: application/json" \
-d '{"model": "echo", "version": "v1", "data": "hello"}'
# -> {"result": "hello"}
4. Run batch inference¶
curl -X POST http://localhost:8000/predict/batch \
-H "X-API-Key: dev-key" \
-H "Content-Type: application/json" \
-d '{"model": "echo", "version": "v1", "items": ["a", "b", "c"]}'
# -> {"results": ["a", "b", "c"]}
5. Submit an async job¶
# Submit
curl -X POST http://localhost:8000/predict/async \
-H "X-API-Key: dev-key" \
-H "Content-Type: application/json" \
-d '{"model": "echo", "version": "v1", "data": "hello"}'
# -> {"job_id": "550e8400-..."}
# Poll status
curl -H "X-API-Key: dev-key" http://localhost:8000/predict/async/<job_id>
# -> {"job_id": "...", "status": "succeeded", "result": "hello", ...}
Next: CLI Quickstart — deploy your own model in one command.