Quota Exceeded Behavior
When a provider returns a quota exceeded error, CLI Proxy API can automatically:- Switch Project: Try another project/credential for the same provider
- Switch Preview Model: Fall back to preview/alternative model variants
Get Project Switching
/v0/management/quota-exceeded/switch-projectRequest
Response
Whether to automatically switch to another project when quota is exceeded
Update Project Switching
/v0/management/quota-exceeded/switch-project/v0/management/quota-exceeded/switch-projectRequest
Request Body
Response
Status of update operation
Get Preview Model Switching
/v0/management/quota-exceeded/switch-preview-modelRequest
Response
Whether to automatically switch to preview model when quota is exceeded
Update Preview Model Switching
/v0/management/quota-exceeded/switch-preview-model/v0/management/quota-exceeded/switch-preview-modelRequest
Request Body
Response
Configuration File
These settings correspond to thequota-exceeded section in config.yaml:
How Project Switching Works
When enabled and a quota exceeded error occurs:- Request fails with quota error from provider
- CLI Proxy API identifies other credentials for the same provider
- Request is retried with the next available credential
- Process continues until success or all credentials exhausted
Example Scenario
Configuration:- Request uses Project A → Quota exceeded
- Automatically retry with Project B → Success
- Client receives response without error
How Preview Model Switching Works
When enabled and a quota exceeded error occurs:- Request fails with quota error for specific model
- CLI Proxy API checks for preview/alternative variants:
gemini-2.5-pro→gemini-2.5-pro-previewgemini-3-pro→gemini-3-pro-preview
- Request is retried with preview model
- Original model name is restored in response
Example Scenario
Request:- Request for
gemini-2.5-pro→ Quota exceeded - Automatically retry with
gemini-2.5-pro-preview→ Success - Response shows
"model": "gemini-2.5-pro"(original)
Combined Behavior
Both settings can be enabled simultaneously for maximum availability:- Try Project A with
model-name - Try Project A with
model-name-preview(if enabled) - Try Project B with
model-name(if enabled) - Try Project B with
model-name-preview(if both enabled) - Continue until success or all options exhausted
Disable All Quota Handling
To return quota errors immediately to clients:Get Current Settings
Retrieve both settings in a single call using the main config endpoint:Use Cases
High Availability Setup
Enable both features for maximum uptime:Strict Quota Monitoring
Disable both features to track quota usage:Project-Level Failover Only
Enable project switching but disable preview models:Model-Level Failover Only
Enable preview switching but disable project switching:Related Configuration
Quota handling works alongside:- Request Retry (
/v0/management/request-retry) - Number of retry attempts - Max Retry Interval (
/v0/management/max-retry-interval) - Max wait before retry - Routing Strategy (
/v0/management/routing/strategy) - How credentials are selected
Error Responses
Invalid Value
value field.
Persistence Failure
Next Steps
- Configuration Endpoints - Configure retry and routing behavior
- OAuth Endpoints - Manage provider authentication
- Log Endpoints - Monitor quota exceeded events