Skip to content

Distributed Tracing

File: app/core/tracing.py

The engine instruments every inference request with an OpenTelemetry span. The SDK is optional — a no-op tracer is used when it is not installed, so no code changes are needed to run without tracing.


Setup

Tracing is initialised once in create_app():

from app.core.tracing import setup_tracing
setup_tracing(app)

When OTEL_EXPORTER_OTLP_ENDPOINT is set and opentelemetry-sdk is installed, spans are exported via OTLP/gRPC. Otherwise they are collected locally (useful with a local Jaeger or OTLP collector).


Environment variables

Variable Description
OTEL_EXPORTER_OTLP_ENDPOINT OTLP/gRPC endpoint, e.g. http://localhost:4317
OTEL_SERVICE_NAME Service name in traces (default: inference-engine)

Span attributes

Every inference span (inference:{model}:{version}) carries:

Attribute Value
model Model name
version Model version
tenant_id Tenant from the API key
request_id X-Request-ID header value
latency_ms End-to-end latency in milliseconds

Exceptions are recorded on the span via span.record_exception().


Installing the SDK

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc opentelemetry-instrumentation-fastapi

Without these packages the engine runs normally — all tracing calls are no-ops.


Using the tracer in custom code

from app.core.tracing import get_tracer

tracer = get_tracer()
with tracer.start_as_current_span("my-custom-span") as span:
    span.set_attribute("key", "value")
    # ... do work

get_tracer() always returns a usable object — either a real OTel tracer or the built-in no-op.