Skip to content

Inference Engine

Production-grade, task-agnostic ML inference backend. Serve any trained model over HTTP without changing the engine's core.


Get Started

New to Inference Engine? Start with Installation and run your first inference in minutes.

Deploying a model? Follow the Deploying a Model guide or use the CLI for one-command deployment.

Configuring for production? See Production Deployment and Environment Variables.

Understanding the system? Read Architecture and Request Lifecycle.


Documentation Sections

Section What you'll find
Quickstart Install, run, first request
Guides Task-based workflows
CLI LLM-assisted deployment and repair
Concepts Architecture, pipeline, routing, jobs
API Reference Endpoint schemas and status codes
Configuration All environment variables
Integrations Redis, Postgres, Triton, ONNX, Docker
Observability Metrics, logs, tracing
Development Local setup, testing, contributing
Reference Quick-lookup tables