What You’ll Build
A customer support system that:- Classifies query complexity using fast, cheap models
- Routes to appropriate models based on complexity
- Caches responses to reduce costs
- Tracks everything in Helicone for analysis and optimization
Prerequisites
- Node.js 18+ installed
- Vercel AI Gateway API key
- Helicone API key
- OpenAI and Anthropic API keys
Setup
Implementation
Step 1: Query Classification
Use a small, fast model with tool calling for precise classification:Step 2: Model Selection Strategy
Route queries to the most appropriate model:Step 3: Handle Support Tickets
Process tickets with full tracing:Step 4: Add Retry Logic
Handle failures gracefully:Complete Example
Put it all together:Monitor in Helicone
Once your assistant is running, view performance in your Helicone dashboard:Filter by Complexity
Filter requests byComplexity property to see:
- Average response time by complexity
- Cost per complexity tier
- Which models handle which query types
- Cache hit rates
Session View
Click on any ticket ID to see the complete flow:- Classification request (cheap, fast)
- Response generation (model selected based on complexity)
- Any retry attempts
- Total cost for the entire ticket
Cost Analysis
Compare costs across complexity tiers:Optimization Tips
Tune Classification
Tune Classification
Monitor which queries are misclassified:Then filter for incorrect classifications to improve your classifier.
Maximize Caching
Maximize Caching
Use temperature 0 and consistent prompts:
Add Feedback Loop
Add Feedback Loop
Collect user ratings to track quality:
Use Rate Limiting
Use Rate Limiting
Prevent abuse and control costs:
Production Checklist
Before deploying:- Set up Helicone alerts for errors and spending
- Add rate limiting per user/session
- Implement retry logic with exponential backoff
- Enable caching with appropriate TTLs
- Add user feedback collection
- Configure logging for debugging
- Test fallback behavior
- Monitor classification accuracy
Next Steps
Cost Tracking
Deep dive into cost optimization strategies
Agent Tracing
Track more complex agent workflows
Structured Outputs
Add function calling for tool use
Caching Guide
Maximize cache hit rates
