Observability Agent
Observability is crucial for monitoring and debugging AI agents in production. This lesson demonstrates how to implement comprehensive observability using AWS Strands with Langfuse integration and OpenTelemetry tracing.Why Observability Matters
Performance Monitoring
Track response times, token usage, and costs
Debugging
Identify and fix issues quickly
Quality Assurance
Monitor response quality and accuracy
Compliance
Maintain audit trails for regulations
Use Cases
Performance Monitoring
Performance Monitoring
- Response time tracking: Monitor how long each interaction takes
- Token usage monitoring: Track costs and efficiency metrics
- Error rate analysis: Identify and debug failed requests
- Resource utilization: Monitor system performance
Debugging and Troubleshooting
Debugging and Troubleshooting
- Distributed tracing: Follow requests through the entire system
- Error tracking: Identify where and why failures occur
- Log aggregation: Centralized logging for easier debugging
- Session tracking: Monitor user interactions over time
Business Intelligence
Business Intelligence
- Usage analytics: Understand how users interact with your agent
- Cost analysis: Track and optimize operational costs
- Quality metrics: Monitor response quality and user satisfaction
- Custom metrics: Track business-specific KPIs
Security and Compliance
Security and Compliance
- Audit trails: Track all agent interactions for compliance
- Security monitoring: Detect suspicious patterns or attacks
- Data privacy: Ensure sensitive data is handled properly
- Access control: Monitor who is using the system
Key Concepts
OpenTelemetry Integration
OpenTelemetry provides standardized observability by automatically instrumenting your agent with:- Distributed tracing: Complete request flows
- Metrics collection: Performance and usage data
- Log correlation: Links logs to specific traces
Langfuse Monitoring
Langfuse provides a comprehensive observability platform with:Trace Visualization
See complete request flows from input to output
Session Tracking
Monitor conversation history and context
Performance Metrics
Response times, token usage, and costs
Custom Dashboards
Business-specific monitoring views
Trace Attributes
Custom attributes provide context for monitoring:| Attribute | Description | Example |
|---|---|---|
session.id | Unique session identifier | ”user-session-123” |
user.id | User identification | ”[email protected]” |
langfuse.tags | Categorization tags | [“production”, “restaurant-bot”] |
Monitoring Metrics
Key metrics to track:- Performance
- Cost
- Quality
- Usage
- Response Time: Latency per interaction
- Throughput: Requests per second
- Queue Length: Pending requests
Implementation
Step 1: Install Dependencies
Step 2: Set Up Environment
Create a.env file with required credentials:
Step 3: Configure OpenTelemetry
Step 4: Create Agent with Observability
Step 5: Use the Agent
Running the Example
Set up Langfuse account
- Sign up at cloud.langfuse.com
- Create a new project
- Copy your public and secret keys
Expected Output
- Full conversation trace
- Token usage statistics
- Response time metrics
- Custom tags and attributes
Langfuse Dashboard Views
Trace Visualization

- User input
- Agent reasoning steps
- Tool calls
- Final response
- Timing for each step
Session Tracking

- Conversation history
- User interactions over time
- Session metadata
- Performance metrics per session
Advanced Configuration
Custom Metrics
Multiple Agents
Error Tracking
Best Practices
Use Descriptive Tags
Tag traces with environment, agent type, and use case
Track User IDs
Associate traces with users for support and analytics
Monitor Costs
Set up alerts for unusual token usage or costs
Set Performance Baselines
Establish normal response times to detect issues
Archive Old Traces
Regularly clean up old trace data
Use Custom Metrics
Track business-specific KPIs alongside technical metrics
Troubleshooting
Traces not appearing in Langfuse
Traces not appearing in Langfuse
- Verify
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYare correct - Check
LANGFUSE_HOSTis set to the correct URL - Ensure telemetry is set up before creating the agent
- Check network connectivity to Langfuse servers
High latency with tracing enabled
High latency with tracing enabled
- Traces are sent asynchronously, shouldn’t add latency
- Check network connection to Langfuse
- Consider batching traces if volume is very high
- Verify OTLP exporter configuration
Missing trace attributes
Missing trace attributes
- Ensure
trace_attributesis set when creating the agent - Verify attribute keys follow OpenTelemetry conventions
- Check that values are serializable (strings, numbers, booleans)
What You Learned
- How to set up OpenTelemetry for agent observability
- How to integrate Langfuse for tracing and monitoring
- How to track custom attributes and tags
- How to monitor performance, costs, and quality
- Best practices for production observability
Next Steps
You can now monitor and debug your agents in production! But what about safety? In the final lesson, you’ll learn how to implement guardrails to protect your agents from harmful inputs and outputs.Lesson 08: Safety Guardrails
Learn how to implement safety measures and content filtering
Resources
Video Tutorial
Watch Lesson 07 on YouTube
Langfuse Documentation
Explore Langfuse features