Overview
Quota management tracks your usage across OpenAI Codex’s rate limit windows and proactively prevents hitting limits. The system monitors two quota windows:- Primary window - Typically 2 hours
- Secondary window - Typically 7 days
Codex Quota Headers
Header Structure
OpenAI Codex returns quota information in response headers (lib/quota-probe.ts:90-147):
Parsed Quota Snapshot
Quota Probing
Lightweight Quota Checks
Probe quota without consuming credits (lib/quota-probe.ts:326-414):
- Minimal input - “quota ping” text
- No reasoning -
effort: 'none' - Low verbosity -
verbosity: 'low' - Immediate cancellation - Stream cancelled after headers received
- No storage -
store: false
Probe Strategies
- Passive Tracking
- Active Probing
- Parallel Probing
Default behavior - Extract quota from normal request headers:✅ No extra cost
✅ Real-time tracking
❌ Only updates during active use
Quota Tracking
Per-Model Quota Keys
Quotas are tracked per model family (lib/accounts/rate-limits.ts:8-24):
- Different models may have different rate limits
- Allows fine-grained rotation within model families
- Enables model-specific quota forecasting
Rate Limit State
Each account tracks rate limits per quota key:Rate Limit Detection
Parse rate limit headers from 429 responses (lib/accounts/rate-limits.ts:73-119):
Preemptive Deferral
Quota Threshold Strategy
Avoid rate limits by rotating before hitting 100% usage:- < 10% remaining - High priority rotation
- < 5% remaining - Mark account as unavailable
- < 1% remaining - Emergency cooldown
Preemptive Quota Scheduler
The scheduler (lib/preemptive-quota-scheduler.ts) automatically rotates accounts:
Quota Display
Human-Readable Formatting
Quota windows are formatted for CLI display (lib/quota-probe.ts:206-300):
Dashboard View
Runcodex auth to see quota status:
Rate Limit Recovery
Automatic Reset Tracking
Rate limits automatically clear after reset time:Reset Time Parsing
Handles multiple header formats (lib/quota-probe.ts:69-88):
Quota Cache
Cache Persistence
Quota snapshots are cached to disk (lib/quota-cache.ts):
- Faster CLI commands (no probe needed)
- Quota visibility for idle accounts
- Reduced API calls
Cache Invalidation
Cache entries are invalidated:- After 5 minutes (TTL)
- On rate limit 429 response
- After successful request (updated with fresh data)
- On manual refresh (
codex auth check --live)
Wait Time Estimation
Calculate Minimum Wait
When all accounts are rate-limited, estimate wait time:Wait Time Formatting
Monitoring Commands
Check Quota Status
Forecast Next Account
Generate Quota Report
Best Practices
Monitor Primary Window
The 2-hour window fills fastest. Keep an eye on primary quota usage and add accounts before hitting limits.
Use Live Probes Sparingly
Live probes consume minimal tokens but add up. Use passive tracking for normal operation, live probes for troubleshooting.
Set Up Multiple Accounts
Having 3-5 accounts provides good rotation headroom. More accounts = more total quota.
Check After Rate Limits
If you hit a rate limit, run
codex auth check --live to see which accounts are still available.Related Concepts
Account Rotation
Learn how quota tracking influences account selection
Multi-Account OAuth
Understand how to authenticate multiple accounts
Commands Reference
View all quota-related commands
Settings Reference
Configure quota thresholds and behavior