CLI Overview¶

The inference engine ships with a CLI for deploying trained model artifacts without writing boilerplate by hand.

Installation¶

uv sync --extra cli   # or: pip install -e ".[cli]"

Variable	Required	Description
`GROQ_API_KEY`	Yes	API key for LLM code generation
`INFERENCE_ENGINE_LLM_MODEL`	No	Override default model (`llama-3.3-70b-versatile`)

Variables are loaded automatically from .env in the project root.

Command	Description
`inference-engine deploy <artifact>`	Deploy a trained artifact — inspect, generate, validate, write
`inference-engine fix <model-dir>`	Fix a broken existing pipeline definition

See deploy.md and fix.md for full reference.

Only load() and predict() are ever generated. The pipeline structure, pre/postprocessors, and definition file template are fixed.
No files are written until validation passes and the user confirms.
The CLI only writes under models/ and patches app/config/routing.py. It never modifies engine internals.
deploy is file-only. Restart the server after deploying to load the new model.
When generation fails after all retries, a scaffold definition.py is written with # TODO comments instead of raising an error.