Overview
CLI Proxy API intelligently routes requests across multiple credentials to maximize availability and balance load. The routing system handles:- Credential selection - Choosing which account to use
- Load balancing - Distributing requests evenly
- Quota management - Handling rate limits and daily quotas
- Automatic failover - Retrying with different credentials
- Model aliasing - Mapping model names
Routing Strategies
Two built-in strategies control credential selection:Round-Robin (Default)
Distributes requests evenly across all available credentials:config.yaml
- Even distribution across accounts
- Maximizing total quota usage
- Avoiding concentration on single account
Fill-First
Uses the first credential until it hits quota, then moves to next:config.yaml
- Staggering rolling-window limits (e.g., chat message caps)
- Minimizing active accounts
- Preserving specific accounts for peak times
Credential States
Each credential can be in one of four states:Ready
Credential is available and will be selected by routing strategy.Cooldown
Credential exceeded quota and is temporarily blocked:sdk/cliproxy/auth/conductor.go
- Detect quota error (HTTP 429 or provider-specific message)
- Calculate backoff using exponential strategy:
- Enter cooldown for calculated duration
- Return to ready after cooldown expires
Blocked
Manually blocked via Management API or attributes.Disabled
Permanently disabled (e.g., deleted auth file).Priority-Based Selection
Credentials can have priority levels:~/.cli-proxy-api/[email protected]
~/.cli-proxy-api/[email protected]
- Priority 10 accounts selected first
- Priority 1 accounts used as fallback
- Priority 0 (default) used last
Model Prefix Routing
Force specific credentials using model prefixes:Configuring Prefixes
config.yaml
Using Prefixes
Force Prefix Mode
Require prefixes for all requests:config.yaml
Model Aliasing
Map client model names to provider model names:Global OAuth Aliases
config.yaml
API Key Aliases
config.yaml
Model Pools (Internal Failover)
Map multiple upstream models to the same alias:config.yaml
- Client requests
best-model - Round-robin selects:
claude-3.5-sonnet - If fails before producing output → retry with
gemini-pro - If fails again → retry with
gpt-4 - If all fail → return error
Model Exclusion
Hide models from the model list:OAuth Exclusions
config.yaml
API Key Exclusions
config.yaml
model-name- Exact matchprefix-*- Matchesprefix-anything*-suffix- Matchesanything-suffix*substring*- Matchesany-substring-here
Automatic Failover
When a request fails, CLI Proxy API automatically retries:Retry Configuration
config.yaml
Retry Logic
- Attempt 1: Credential A → Fails (quota exceeded)
- Attempt 2: Credential B → Fails (503 error)
- Attempt 3: Credential C → Fails (timeout)
- Attempt 4: Credential D → Success ✓
403- Forbidden408- Request Timeout429- Too Many Requests (quota)500- Internal Server Error502- Bad Gateway503- Service Unavailable504- Gateway Timeout
Quota Failover
Special handling for quota-related errors:config.yaml
true, quota errors trigger immediate retry with next credential:
true, falls back to preview models:
Multi-Provider Routing
Some models are available from multiple providers:config.yaml
- Filter to credentials offering the requested model
- Apply routing strategy within available credentials
- Round-robin across providers (not just accounts)
Request Metadata
Control routing via request metadata:Pin to Specific Credential
auth-id-123.
Track Selected Credential
Model Registry
The registry dynamically tracks which credentials can serve which models:Streaming Bootstrap Retries
For streaming requests, retries happen before the first byte is sent:config.yaml
- Attempt 1: Credential A → Error before streaming
- Attempt 2: Credential B → Starts streaming → Success
Performance Considerations
Scheduler Optimization
The scheduler pre-builds selection views:sdk/cliproxy/auth/scheduler.go
- O(1) credential selection (no sorting on hot path)
- Efficient priority handling
- Fast cooldown management
Concurrency
Routing decisions are lock-free for read paths:sdk/cliproxy/auth/conductor.go
Debugging Routing
Enable debug logging:config.yaml
- Credential selection decisions
- Cooldown state changes
- Retry attempts
- Provider routing
Best Practices
Use round-robin for even distribution
Use round-robin for even distribution
Round-robin maximizes total quota usage by spreading load across all accounts evenly.
Use fill-first for rolling-window limits
Use fill-first for rolling-window limits
Fill-first prevents hitting multiple accounts’ daily message limits simultaneously.
Set priorities for fallback accounts
Set priorities for fallback accounts
Keep low-priority accounts as emergency backup when primary accounts hit quota.
Use prefixes for team isolation
Use prefixes for team isolation
Assign each team a prefixed credential pool to prevent quota conflicts.
Configure retry limits
Configure retry limits
Set
max-retry-credentials to prevent excessive retry attempts that delay errors.Next Steps
Configuration
Configure routing behavior
Model Mappings
Set up model aliases and pools
Providers
Learn about provider-specific features
Management API
Monitor routing via API