Overview
The Router provides intelligent load balancing, fallbacks, and retries across multiple LLM deployments. This guide covers router-specific configuration.Basic Router Setup
Python Configuration
YAML Configuration (for Proxy)
Routing Strategies
simple-shuffle (Default)
Randomly selects from available deployments.usage-based-routing
Respects TPM (tokens per minute) and RPM (requests per minute) limits.latency-based-routing
Routes to the deployment with lowest latency.least-busy
Routes to deployment with fewest ongoing requests.cost-based-routing
Routes to the cheapest deployment.Fallback Configuration
Basic Fallbacks
Context Window Fallbacks
Retry Configuration
Basic Retries
Per-Error Retry Policy
Per-Model-Group Retry Policy
Cooldown Configuration
Caching Configuration
Redis Caching
In-Memory Caching
Model Aliases
Complete Production Example
Best Practices
- Use TPM/RPM limits: Always set limits to respect provider quotas
- Configure fallbacks: Have backup models for reliability
- Enable caching: Reduce costs and latency
- Monitor latency: Use latency-based routing in production
- Set appropriate timeouts: Balance responsiveness and success rate
- Use cooldowns: Prevent cascading failures
- Test retry policies: Ensure they match your use case
- Use model aliases: Abstract model names for easier updates