Routing Strategies
Codex-LB supports two primary routing strategies:Usage-Weighted Routing
Routes requests to accounts based on remaining capacity.- Accounts with more remaining capacity receive more traffic
- Accounts near rate limits receive less traffic
- Weights are recalculated based on real-time usage
- Maximizing throughput
- Avoiding rate limit errors
- Production environments with multiple accounts
Round-Robin Routing
Distributes requests evenly across all available accounts.- Each account receives requests in rotation
- No weighting based on usage or capacity
- Simpler algorithm with less overhead
- Testing and development
- Accounts with similar quotas
- Simpler deployment scenarios
Configuring Routing Strategy
Select Routing Strategy
Choose your preferred routing strategy:
- Usage-weighted: Distributes traffic based on remaining capacity (recommended)
- Round-robin: Distributes traffic evenly across accounts
Account Selection
Eligible Accounts
For each incoming request, Codex-LB considers accounts that are:- Active status: Account status is
active - Not rate limited: Account has not hit ChatGPT rate limits
- Fresh tokens: Access tokens are valid and not expired
- Available quota: Account has remaining usage capacity (for usage-weighted)
Account Status Impact
Account status affects routing eligibility:| Status | Eligible for Routing? | Notes |
|---|---|---|
active | Yes | Normal operation |
rate_limited | No | Temporarily excluded until limits reset |
quota_exceeded | No | Excluded until quota resets |
paused | No | Manually paused by admin |
deactivated | No | Permanently excluded |
Account Recovery
Accounts automatically recover from temporary states:- Rate limited: After the rate limit window expires (typically 3-60 minutes)
- Quota exceeded: After the quota window resets (daily/weekly)
- Token expired: After automatic token refresh
Model-Specific Restrictions
You can restrict which models an API key can access using theallowed_models field:
- Only listed models can be requested
- Other models return
403 Forbidden - Empty or
nullallows all models
- Restrict expensive models to production keys
- Limit test keys to cheaper models
- Enforce compliance requirements
Example Configurations
Production Key (All Models)
Development Key (Budget Models)
Premium Key (Latest Models)
Sticky Sessions
Sticky sessions ensure that requests with the sameprompt_cache_key are routed to the same account.
- Requests with the same
prompt_cache_keyare routed to the same account - Improves prompt caching efficiency
- Reduces latency for multi-turn conversations
- Dashboard Settings → “Sticky threads”
- Pass
prompt_cache_keyin requests
- Better prompt cache hit rates
- Lower costs for cached tokens
- Consistent experience for multi-turn conversations
- If the sticky account becomes unavailable, requests are routed to another account
- Sticky sessions are reallocated if the account status changes
Account Preferences
Prefer Earlier Reset Accounts
- Prioritizes accounts that will reset sooner
- Helps distribute usage across reset windows
- Reduces risk of all accounts hitting limits simultaneously
- Managing accounts with different reset times
- Smoothing out traffic patterns
- Preventing simultaneous rate limit errors
Load Balancer Behavior
Selection Algorithm
Filter Accounts
Identify accounts that are:
- Active status
- Not rate limited or quota exceeded
- Not paused or deactivated
- Have valid, unexpired tokens
Check Sticky Session
If sticky sessions are enabled and a
prompt_cache_key is provided:- Check if a sticky session exists for this key
- If yes, prefer that account (if available)
- If account unavailable, reallocate to another account
Apply Routing Strategy
Usage-weighted:
- Calculate remaining capacity for each account
- Weight selection probability by remaining capacity
- Accounts with more capacity are more likely to be selected
- Select next account in rotation
- Skip unavailable accounts
- Continue rotation from last selected account
Apply Preferences
If “prefer earlier reset” is enabled:
- Sort accounts by reset time
- Prefer accounts that reset sooner
Retry Logic
If a request fails, Codex-LB automatically retries with a different account:Detect Failure
Request fails due to:
- Rate limit error (429)
- Quota exceeded error (403)
- Token expiration (401)
- Network error
Mark Account Status
Update account status based on error:
rate_limit_exceeded→rate_limitedquota_exceeded→quota_exceededinsufficient_quota→quota_exceeded- Token errors →
deactivated(if permanent)
Error Handling
No Available Accounts
503 Service Unavailable
Causes:
- All accounts are rate limited or quota exceeded
- All accounts are paused or deactivated
- No accounts added to the load balancer
- All accounts have expired or invalid tokens
- Check account status in the dashboard
- Wait for rate limits to reset
- Add more accounts to increase capacity
- Reactivate paused accounts
Model Not Allowed
403 Forbidden
Cause: Requested model not in API key’s allowed_models list.
Solution: Update the API key’s allowed_models or use a different model.
Rate Limit Propagation
When an account hits a rate limit:- Account status changes to
rate_limited - Account is excluded from routing
- Error details are recorded:
- Error code (e.g.,
rate_limit_exceeded) - Error message from ChatGPT
- Timestamp of failure
- Error code (e.g.,
- Account automatically recovers after the rate limit window
Monitoring
Account Status
Monitor account status in the dashboard:- Active: Available for routing
- Rate limited: Temporarily unavailable
- Quota exceeded: Quota exhausted
- Paused: Manually disabled
- Deactivated: Permanently disabled
Usage Metrics
Track usage across accounts:- Total requests: Number of requests routed to each account
- Token usage: Input/output/cached tokens per account
- Error rate: Percentage of failed requests per account
- Remaining capacity: Available quota for each account
Rate Limit Headers
Response headers show account-level rate limits:Best Practices
Account Management
- Multiple accounts: Add multiple accounts to increase capacity and reliability
- Diverse reset times: Add accounts at different times to stagger reset windows
- Monitor status: Check account status regularly and reactivate as needed
- Remove inactive: Delete deactivated accounts to reduce noise
Routing Strategy
- Production: Use
usage_weightedfor optimal load distribution - Development: Use
round_robinfor simplicity - Sticky sessions: Enable for applications with prompt caching
- Prefer earlier reset: Enable for smoother traffic distribution
Model Restrictions
- Budget control: Restrict expensive models to production keys
- Testing: Use cheaper models for development and testing
- Compliance: Enforce model restrictions for regulatory requirements
Error Handling
- Implement retries: Client applications should retry on
503errors - Exponential backoff: Use exponential backoff for retries
- Fallback logic: Have fallback behavior when all accounts are unavailable
- Monitor alerts: Set up alerts for “no available accounts” errors
Advanced Configuration
Custom Routing Logic
While Codex-LB provides built-in routing strategies, you can implement custom logic by:- Monitoring account status via API
- Distributing requests across multiple Codex-LB instances
- Using external load balancers with health checks
Account Pools
Organize accounts into pools for different use cases:- Pool A: High-quota accounts for production
- Pool B: Lower-quota accounts for development
- Pool C: Specific accounts for certain models
Geographic Distribution
Distribute accounts across regions for lower latency:- Deploy Codex-LB instances in multiple regions
- Add accounts with tokens from the same region
- Route requests to the nearest instance
Troubleshooting
Uneven traffic distribution
Cause: Some accounts have much more capacity than others. Solution:- Use
usage_weightedrouting to automatically balance based on capacity - Add more accounts with similar quotas
- Enable “prefer earlier reset” to distribute across reset windows
Sticky sessions not working
Cause:sticky_threads_enabled is disabled or prompt_cache_key is not provided.
Solution:
- Enable sticky threads in settings
- Pass
prompt_cache_keyin request body - Verify key is consistent across related requests
Accounts frequently rate limited
Cause: Not enough accounts for the request volume. Solution:- Add more accounts to increase total capacity
- Implement client-side rate limiting
- Use API key rate limits to control usage
- Monitor usage patterns and adjust
Request fails even with available accounts
Cause: Model restrictions, API key limits, or network errors. Solution:- Check API key
allowed_modelsconfiguration - Verify API key rate limits
- Check Codex-LB logs for detailed error messages
- Test with a simple request to isolate the issue
Next Steps
Troubleshooting
Diagnose and resolve common issues
API Reference
Explore the complete API documentation