Overview
Codex-LB supports multiple load balancing strategies to distribute requests across your pooled accounts. The strategy you choose affects throughput, fairness, and quota utilization.Available Strategies
Usage-Weighted (Default)
Theusage_weighted strategy prioritizes accounts with the lowest usage percentage, ensuring even distribution of quota consumption.
How it works:
- Calculates current usage percentage for each account
- Sorts accounts by: secondary usage → primary usage → last selected time
- Selects the account with the lowest combined usage
- Default for most deployments - Balances load fairly across all accounts
- Quota-conscious workloads - Prevents early exhaustion of any single account
- Heterogeneous account mix - Works well with different plan types (Plus, Team, Enterprise)
Usage percentages are calculated from ChatGPT’s own rate limit windows (typically 5-minute sliding windows for rate limits and weekly/monthly windows for quotas).
Round-Robin
Theround_robin strategy rotates through accounts sequentially, selecting the least recently used account.
How it works:
- Tracks
last_selected_attimestamp for each account - Sorts accounts by selection time (oldest first)
- Selects the account that was used longest ago
- Testing and validation - Predictable distribution for debugging
- Uniform account types - When all accounts have identical quotas
- Low-utilization scenarios - When usage tracking overhead isn’t needed
Configuring Strategy
Via Dashboard
- Navigate to Settings tab
- Find Load Balancing section
- Select routing strategy from dropdown:
usage_weighted(recommended)round_robin
- Click Save Changes
Via API
Via Database
The strategy is stored in thedashboard_settings table:
Advanced Options
Prefer Earlier Reset Accounts
When enabled, this option adds an additional sorting dimension to prioritize accounts that will reset soonest. How it works:- Groups accounts by days until reset (bucketed)
- Within each bucket, applies usage-weighted sorting
- Prefers accounts resetting in 1 day over those resetting in 7 days
Sticky Sessions
Sticky sessions bind a conversation thread to a specific account, ensuring continuity. Configuration:- Sticky key is derived from
chatgpt-conversation-idheader - First request in conversation selects account using configured strategy
- Subsequent requests in same conversation reuse the same account
- If pinned account becomes unavailable, a new account is selected
Sticky sessions are stored in the
sticky_sessions table and survive server restarts.Selection Algorithm
The complete selection algorithm (fromapp/core/balancer/logic.py:45-134):
Performance Considerations
Usage Refresh Overhead
Usage-weighted strategy requires periodic usage data refresh:- Refresh interval: Configurable via
USAGE_REFRESH_INTERVAL_SECONDS(default: 300s) - API calls: One usage API call per account per interval
- Database writes: One
usage_historyrow per account per interval
Request Latency
The selection algorithm runs on every request:- Usage-weighted: O(n log n) sorting where n = number of accounts
- Round-robin: O(n log n) sorting by timestamp
- Typical overhead: Less than 1ms for pools up to 100 accounts
Strategy Comparison
| Feature | Usage-Weighted | Round-Robin |
|---|---|---|
| Fairness | High - distributes evenly by quota | Medium - rotates evenly by request count |
| Quota awareness | Yes - prioritizes low-usage accounts | No - ignores current usage |
| Simplicity | Medium - requires usage tracking | High - only tracks selection time |
| Overhead | Low - cached usage data | Minimal - timestamp only |
| Best for | Production workloads with mixed accounts | Testing or homogeneous pools |
| Quota exhaustion risk | Low - balances consumption | High - can exhaust accounts unevenly |
Troubleshooting
Uneven Distribution
Symptom: Some accounts heavily used, others idle Causes:- Round-robin with heterogeneous account types
- Sticky sessions concentrating traffic
- Some accounts frequently rate-limited (check error logs)
All Accounts Exhausted
Symptom: “No available accounts” errors Causes:- Insufficient total quota for traffic volume
- All accounts hitting rate limits or quota simultaneously
- Add more accounts to pool
- Reduce traffic or implement request queuing
- Upgrade accounts to higher-tier plans
Selection Bias
Symptom: One account always selected first Causes:- Account has significantly lower usage than others
- Clock skew affecting
last_selected_attimestamps
Related Features
- Account Pooling - Understanding account states
- Usage Tracking - Monitor consumption patterns
- API Keys - Filter accounts by model support
Technical Reference
Key source files:app/core/balancer/logic.py:45-134- Selection algorithmapp/modules/proxy/load_balancer.py:54-138- Load balancer integrationapp/db/models.py:30-36- AccountStatus enumapp/core/balancer/types.py:21- RoutingStrategy type