Request Lifecycle
Synchronous inference

Returns {"result": ...} to the client.
Asynchronous inference

GET /predict/async/{job_id}
└─ JobService.get_job(job_id) → Job → PredictAsyncStatusResponse
Middleware execution order
Middleware runs in reverse registration order in Starlette. Effective order per request:
AuthMiddleware → RateLimitMiddleware → PayloadGuardMiddleware → Route handler
A request that fails auth never reaches the rate limiter.
Graceful shutdown
Lifespan shutdown
├─ cpu_executor._executor.shutdown(wait=True) ← drains all in-flight futures
├─ gpu_executor._executor.shutdown(wait=True)
└─ lru_cache cleared for all dep singletons