Inference Engine¶
Production-grade, task-agnostic ML inference backend. Serve any trained model over HTTP without changing the engine's core.
Get Started¶
New to Inference Engine? Start with Installation and run your first inference in minutes.
Deploying a model? Follow the Deploying a Model guide or use the CLI for one-command deployment.
Configuring for production? See Production Deployment and Environment Variables.
Understanding the system? Read Architecture and Request Lifecycle.
Documentation Sections¶
| Section | What you'll find |
|---|---|
| Quickstart | Install, run, first request |
| Guides | Task-based workflows |
| CLI | LLM-assisted deployment and repair |
| Concepts | Architecture, pipeline, routing, jobs |
| API Reference | Endpoint schemas and status codes |
| Configuration | All environment variables |
| Integrations | Redis, Postgres, Triton, ONNX, Docker |
| Observability | Metrics, logs, tracing |
| Development | Local setup, testing, contributing |
| Reference | Quick-lookup tables |