LLM Observability
Observability for Large Language Model applications requires specialized tooling to track token usage, latency, multi-step reasoning chains, and costs. This guide covers how to instrument your LLM applications with modern observability platforms.Why LLM Observability Matters
LLM applications present unique monitoring challenges:- Variable costs: Each request has different token usage and costs
- Long latency: Responses can take seconds to minutes
- Complex workflows: Multi-step reasoning, tool use, and retrieval
- Non-determinism: Same input can produce different outputs
- Quality assessment: Correctness is harder to evaluate automatically
OpenTelemetry Basics
OpenTelemetry provides a vendor-neutral standard for collecting telemetry data:- Traces: Request flows through distributed systems
- Metrics: Numerical measurements over time
- Logs: Discrete events with context
Key Concepts
Spans and Traces
Spans and Traces
A trace represents a request’s journey through your system. It’s composed of spans, where each span represents a unit of work (e.g., an API call, database query, or LLM inference).Spans can be nested to show parent-child relationships, making it easy to see which operations take the most time.
Context Propagation
Context Propagation
OpenTelemetry automatically propagates context across service boundaries, so you can trace a request through multiple microservices and see the complete picture.
Attributes and Events
Attributes and Events
You can attach attributes (key-value pairs) to spans to add context, like model name, prompt length, or token count. Events mark specific points in time within a span.
Observability Platforms
This module demonstrates three observability platforms for LLM applications:1. AgentOps
AgentOps specializes in monitoring AI agents:- Automatic tracking of OpenAI calls
- Agent workflow visualization
- Cost tracking per session
- Simple integration with no code changes
2. LangSmith
LangSmith from LangChain provides detailed tracing:- Detailed trace trees for complex chains
- Prompt and response logging
- Evaluation and testing tools
- Dataset management
3. OpenLLMetry
OpenLLMetry (Traceloop) uses OpenTelemetry standards:- OpenTelemetry compatible (works with any backend)
- Automatic instrumentation for popular frameworks
- Custom workflow decorators
- Self-hosted option
Example Applications
The module includes two reference applications:Text-to-SQL Application
Location:llm-apps/sql_app.py
This application demonstrates observability across three platforms:
AI Scientist Paper Reviewer
Location:llm-apps/reviewer.py
This demonstrates observability for a complex multi-step reasoning application:
- Paper parsing
- Multiple review iterations
- Ensemble evaluation
- Reflection and refinement
Environment Setup
Required Environment Variables
Running the Examples
Custom Instrumentation
Add custom spans to track specific operations:Workflow Decorators
Use decorators to automatically track functions:Comparing Platforms
| Feature | AgentOps | LangSmith | OpenLLMetry |
|---|---|---|---|
| Auto-instrumentation | ✅ | ✅ | ✅ |
| Cost tracking | ✅ | ✅ | ❌ |
| Custom backends | ❌ | ❌ | ✅ |
| Evaluation tools | ⚠️ Basic | ✅ Advanced | ❌ |
| Self-hosted | ❌ | ⚠️ Paid | ✅ |
| OpenTelemetry | ❌ | ❌ | ✅ |
| LangChain integration | ⚠️ | ✅ Native | ✅ |
Best Practices
Sample Strategically
Don’t trace every request in high-volume production. Use sampling to reduce overhead while maintaining visibility.
Add Context
Include user IDs, session IDs, model versions, and other metadata as span attributes for easier debugging.
Monitor Costs
Track token usage and costs per request, user, or feature to optimize spending.
Set Alerts
Configure alerts for high latency, error rates, or cost spikes to catch issues early.
Troubleshooting
Traces not appearing
Traces not appearing
- Check that
TRACELOOP_BASE_URLis set correctly - Verify the OpenTelemetry collector is running
- Ensure your application has network access to the collector
- Check for initialization errors in application logs
Missing span data
Missing span data
- Verify the SDK version is compatible
- Check that instrumentation is initialized before creating the client
- Ensure exceptions aren’t silently caught
High overhead
High overhead
- Enable sampling to reduce trace volume
- Disable detailed logging in production
- Use batch exporters instead of synchronous ones
Additional Resources
Next Steps
Install SigNoz
Set up SigNoz as your observability backend for OpenTelemetry traces