Inference Engine¶

Production-grade, task-agnostic ML inference backend. Serve any trained model over HTTP without changing the engine's core.

Get Started¶

New to Inference Engine? Start with Installation and run your first inference in minutes.

Deploying a model? Follow the Deploying a Model guide or use the CLI for one-command deployment.

Understanding the system? Read Architecture and Request Lifecycle.

Section	What you'll find
Quickstart	Install, run, first request
Guides	Task-based workflows
CLI	LLM-assisted deployment and repair
Concepts	Architecture, pipeline, routing, jobs
API Reference	Endpoint schemas and status codes
Configuration	All environment variables
Integrations	Redis, Postgres, Triton, ONNX, Docker
Observability	Metrics, logs, tracing
Development	Local setup, testing, contributing
Reference	Quick-lookup tables