Overview
The agent comparison pipeline:- Loads real customer service conversations
- Trains an intent classifier on the dataset
- Evaluates three agent architectures:
- SingleAgentRAG: Unified knowledge base approach
- MultiSpecialistAgents: Routing to specialized agents
- LangGraphAgent: Stateful workflow with validation
- Generates performance metrics and visualizations
- Creates interactive Mermaid diagrams for each architecture
- Produces an HTML comparison report
Source code
The complete example is available at:Quick start
Clone and run the example:- With API keys: Uses LiteLLM for real agent responses
- Without API keys: Uses mock responses (perfect for demos)
Pipeline structure
Agent architectures
1. SingleAgentRAG
Unified approach using a knowledge base:- Single unified approach for all query types
- Simple knowledge base lookup
- No query routing or specialization
- Works with or without LLM
2. MultiSpecialistAgents
Routing to specialized agents:- Keyword-based routing to specialists
- Four specialized agents (returns, billing, technical, general)
- Higher confidence due to specialization
- Specialized prompts per agent
3. LangGraphAgent
Stateful workflow with validation:- Four-stage workflow: analyze → classify → generate → validate
- Stateful processing with LangGraph
- Response quality validation
- Confidence adjustment based on complexity
Evaluation step
Visualization
The pipeline generates interactive Mermaid diagrams for each architecture:Results report
The final step generates an HTML comparison report:Key findings
The evaluation typically reveals:- SingleAgentRAG: Good baseline, simple to implement
- MultiSpecialistAgents: 5-10% higher confidence through specialization
- LangGraph: Best for complex workflows, built-in validation
- Latency: LangGraph slightly higher due to multi-stage workflow
- Token usage: Similar across architectures
- Confidence: MultiSpecialist and LangGraph outperform SingleAgent
Running with observability
Integrate Langfuse for cost and performance tracking:- Token usage per query
- Latency by architecture
- Cost breakdown
- LLM calls and responses
Next steps
Agent evaluation
Learn more about systematic agent evaluation
Deploying agents
Deploy agents as HTTP services
Framework integrations
Examples for 12+ agent frameworks
Orchestrating agents
Production agent orchestration patterns
