Skip to content

Project Structure

Layer dependency diagram Layer dependency diagram

inference-engine/
├── app/
│   ├── adapters/
│   │   └── http/               # FastAPI routes, middleware, schemas, deps
│   ├── services/               # Orchestration: PredictionService, AsyncInferenceService
│   ├── domain/
│   │   ├── models/             # BaseModel and model implementations
│   │   ├── pipelines/          # InferencePipeline
│   │   ├── processing/         # Preprocessors and postprocessors
│   │   ├── validation/         # Validators
│   │   ├── jobs/               # Job dataclass, JobStatus, JobStore interface
│   │   ├── registry/           # ModelRegistry
│   │   ├── loading/            # LocalModelLoader, S3ModelLoader
│   │   └── definitions/        # Built-in model definitions (echo)
│   ├── execution/              # InferenceExecutor, OnnxExecutor, TritonExecutor, ExecutionPolicy
│   ├── infra/
│   │   ├── jobs/               # SQLiteJobStore, PostgresJobStore
│   │   └── queue/              # ArqJobQueue, arq worker
│   ├── config/                 # routing.py, execution.py, sla.py
│   ├── security/               # Auth, rate limiting
│   ├── core/                   # Metrics, logging, tracing
│   └── cli/                    # deploy and fix commands
├── models/                     # Auto-discovered model definitions
├── model_artifacts/            # Binary artifacts (weights, pickles, ONNX files)
├── tests/
├── docs/
├── mkdocs.yml
├── pyproject.toml
├── docker-compose.yml
├── Dockerfile
└── dev.sh

Layer rules

  • domain/ has no imports from services/, adapters/, or infra/
  • services/ has no imports from adapters/
  • infra/ is the only layer that imports storage SDKs (asyncpg, arq, boto3)
  • adapters/http/ is the only layer that imports FastAPI/Pydantic