Skip to main content
The quota management endpoints control how CLI Proxy API responds when provider quotas are exceeded.

Quota Exceeded Behavior

When a provider returns a quota exceeded error, CLI Proxy API can automatically:
  1. Switch Project: Try another project/credential for the same provider
  2. Switch Preview Model: Fall back to preview/alternative model variants
These settings allow uninterrupted service when quotas are hit.

Get Project Switching

GET
endpoint
/v0/management/quota-exceeded/switch-project
Returns whether automatic project switching is enabled.

Request

curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

Response

switch-project
boolean
Whether to automatically switch to another project when quota is exceeded
{
  "switch-project": true
}

Update Project Switching

PUT
endpoint
/v0/management/quota-exceeded/switch-project
PATCH
endpoint
/v0/management/quota-exceeded/switch-project
Enable or disable automatic project switching.

Request

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

Request Body

{
  "value": true
}

Response

status
string
Status of update operation
{
  "status": "ok"
}

Get Preview Model Switching

GET
endpoint
/v0/management/quota-exceeded/switch-preview-model
Returns whether automatic preview model switching is enabled.

Request

curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Response

switch-preview-model
boolean
Whether to automatically switch to preview model when quota is exceeded
{
  "switch-preview-model": true
}

Update Preview Model Switching

PUT
endpoint
/v0/management/quota-exceeded/switch-preview-model
PATCH
endpoint
/v0/management/quota-exceeded/switch-preview-model
Enable or disable automatic preview model switching.

Request

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Request Body

{
  "value": false
}

Response

{
  "status": "ok"
}

Configuration File

These settings correspond to the quota-exceeded section in config.yaml:
quota-exceeded:
  # Automatically switch to another project when quota exceeded
  switch-project: true
  
  # Automatically switch to preview model when quota exceeded
  switch-preview-model: true

How Project Switching Works

When enabled and a quota exceeded error occurs:
  1. Request fails with quota error from provider
  2. CLI Proxy API identifies other credentials for the same provider
  3. Request is retried with the next available credential
  4. Process continues until success or all credentials exhausted

Example Scenario

Configuration:
gemini-api-key:
  - api-key: "AIzaSy...01"  # Project A
  - api-key: "AIzaSy...02"  # Project B
  - api-key: "AIzaSy...03"  # Project C

quota-exceeded:
  switch-project: true
Flow:
  1. Request uses Project A → Quota exceeded
  2. Automatically retry with Project B → Success
  3. Client receives response without error

How Preview Model Switching Works

When enabled and a quota exceeded error occurs:
  1. Request fails with quota error for specific model
  2. CLI Proxy API checks for preview/alternative variants:
    • gemini-2.5-progemini-2.5-pro-preview
    • gemini-3-progemini-3-pro-preview
  3. Request is retried with preview model
  4. Original model name is restored in response

Example Scenario

Request:
{
  "model": "gemini-2.5-pro",
  "messages": [...]  
}
Flow:
  1. Request for gemini-2.5-pro → Quota exceeded
  2. Automatically retry with gemini-2.5-pro-preview → Success
  3. Response shows "model": "gemini-2.5-pro" (original)

Combined Behavior

Both settings can be enabled simultaneously for maximum availability:
# Enable both features
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model
Retry Order:
  1. Try Project A with model-name
  2. Try Project A with model-name-preview (if enabled)
  3. Try Project B with model-name (if enabled)
  4. Try Project B with model-name-preview (if both enabled)
  5. Continue until success or all options exhausted

Disable All Quota Handling

To return quota errors immediately to clients:
# Disable project switching
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

# Disable preview model switching  
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Get Current Settings

Retrieve both settings in a single call using the main config endpoint:
curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/config | jq '."quota-exceeded"'
{
  "switch-project": true,
  "switch-preview-model": true
}

Use Cases

High Availability Setup

Enable both features for maximum uptime:
quota-exceeded:
  switch-project: true
  switch-preview-model: true
Best for production environments where uninterrupted service is critical.

Strict Quota Monitoring

Disable both features to track quota usage:
quota-exceeded:
  switch-project: false
  switch-preview-model: false
Best for development/testing when you need to know exactly when quotas are hit.

Project-Level Failover Only

Enable project switching but disable preview models:
quota-exceeded:
  switch-project: true
  switch-preview-model: false
Best when you want failover between accounts but prefer explicit model selection.

Model-Level Failover Only

Enable preview switching but disable project switching:
quota-exceeded:
  switch-project: false
  switch-preview-model: true
Best when you have a single account but want automatic fallback to preview models. Quota handling works alongside:
  • Request Retry (/v0/management/request-retry) - Number of retry attempts
  • Max Retry Interval (/v0/management/max-retry-interval) - Max wait before retry
  • Routing Strategy (/v0/management/routing/strategy) - How credentials are selected
See Configuration Endpoints for details.

Error Responses

Invalid Value

{
  "error": "invalid body"
}
Returned when request body is malformed or missing value field.

Persistence Failure

{
  "error": "failed to save config: permission denied"
}
Returned when config file cannot be written.

Next Steps

Build docs developers (and LLMs) love