Overview
Rate limits help you:- Control costs: Cap spending per API key
- Prevent abuse: Limit usage to expected patterns
- Manage resources: Distribute quota across applications
- Enforce policies: Implement organizational usage policies
Limit Types
Codex-LB supports four types of rate limits:Total Tokens
Limits the combined number of input and output tokens.Input Tokens
Limits only the prompt (input) tokens.Output Tokens
Limits only the completion (output) tokens.Cost (USD)
Limits based on total cost in microdollars (1 USD = 1,000,000 microdollars).- Standard input/output token pricing
- Cached token discounts
- Reasoning token pricing (for o1 models)
Limit Windows
Rate limits can be applied over different time windows:Daily
Resets every 24 hours from the time the limit was created or last reset.Weekly
Resets every 7 days from the time the limit was created or last reset.Monthly
Resets every 30 days from the time the limit was created or last reset.Model-Specific Limits
You can apply different limits to different models using themodel_filter field:
- Limits with
model_filter: nullapply to all models - Limits with a specific model apply only to that model
- Model names must match exactly (case-sensitive)
- Multiple limits can exist for the same model
- Requests to
gpt-4are checked against the $50/day limit - Requests to
gpt-4-turboare checked against the $20/day limit - Requests to
gpt-3.5-turboare checked against the $10/day limit (global)
Combining Limits
You can configure multiple limits for a single API key:- Must not exceed 1M tokens per day
- Must not exceed $50 per day
- Must not exceed $1,000 per month
Creating Limits
Create or Edit API Key
When creating or editing an API key, add limits in the “Rate Limits” section.
Configure Limit
For each limit, specify:
- Limit Type:
total_tokens,input_tokens,output_tokens, orcost_usd - Limit Window:
daily,weekly, ormonthly - Max Value: The maximum allowed value
- Model Filter (optional): Specific model to apply this limit to
Example Configurations
Basic Daily Token Limit
Cost-Based Budget
Model-Specific Limits
Separate Input/Output Limits
Usage Enforcement
Request Reservation
When a request arrives, Codex-LB:Check Applicable Limits
Identify all limits that apply to the request:
- Limits with
model_filter: null - Limits with
model_filtermatching the requested model
Reserve Quota
For each applicable limit, reserve a portion of quota:
- Tokens: Reserve 8,192 tokens (typical request size)
- Cost: Reserve $2 (2,000,000 microdollars) based on estimated pricing
429 Too Many Requests.Automatic Reset
Limits automatically reset when their time window expires:current_valueresets to0reset_atadvances by the window duration- Pending requests can proceed once limits reset
Manual Reset
You can manually reset usage for an API key:- Sets all
current_valuefields to0 - Updates
reset_atto the next window boundary - Immediately allows new requests
Monitoring Limits
Current Usage
View current usage in the dashboard for each limit:Rate Limit Headers
All API responses include rate limit headers:X-RateLimit-{Metric}-{LimitType}-{Window}
Metrics:
Limit: Maximum value for this limitRemaining: Remaining quota before hitting the limitReset: Unix timestamp when the limit resets
Rate Limit Errors
When a limit is exceeded:429 Too Many Requests
Response Headers:
Advanced Scenarios
Progressive Limits
Combine daily, weekly, and monthly limits for progressive enforcement:- Can’t spend more than $50/day
- Can’t spend more than $300/week (even if under daily limits)
- Can’t spend more than $1,000/month
Tiered Model Access
Give different quotas to different model tiers:Zero-Cost Testing
Use token limits without cost limits for testing:Troubleshooting
Limits not enforcing
Cause: Limit configuration error or no applicable limits for the model. Solution:- Verify limit configuration in the dashboard
- Check that
model_filtermatches the requested model exactly - Ensure at least one limit applies (either global or model-specific)
Usage higher than expected
Cause: Cached tokens, reasoning tokens, or streaming overhead. Solution:- Check
cached_input_tokensin usage reports (cached tokens are cheaper but still counted) - For o1 models, check
reasoning_tokens(reasoning tokens cost more) - Consider using
cost_usdlimits instead of token limits for accurate budget control
Limits resetting at wrong time
Cause: Reset time is calculated from limit creation, not calendar boundaries. Solution:- Check the
reset_attimestamp in the limit details - Manually reset the limit to align with desired time
- Recreate the limit at the desired start time
Rate limit exceeded but usage shows available quota
Cause: Reserved quota from in-flight requests hasn’t been finalized. Solution: Wait for in-flight requests to complete. Reserved quota is released or adjusted after responses complete.Different limits for same model causing confusion
Cause: Multiple limits with overlappingmodel_filter values.
Solution: Be explicit with model filters:
- Use
nullfor global limits - Use specific model names for model-specific limits
- Avoid duplicate limit type + window + model filter combinations
Best Practices
Budget Control
- Use cost limits for direct budget enforcement
- Combine daily and monthly limits for progressive caps
- Set alerts at 80% and 90% usage thresholds
- Review usage weekly to adjust limits
Fair Usage
- Different keys for different apps to isolate usage
- Separate dev/staging/prod keys with appropriate limits
- Model-specific limits to control expensive model usage
- Monitor last_used_at to identify unused keys
Performance
- Token limits are faster to calculate than cost limits
- Fewer limits per key reduces overhead
- Global limits (no model filter) are faster than model-specific limits
Next Steps
Managing API Keys
Learn more about API key management
Model Routing
Configure how requests are routed to accounts