Quota Management Endpoints

The quota management endpoints control how CLI Proxy API responds when provider quotas are exceeded.

Quota Exceeded Behavior

When a provider returns a quota exceeded error, CLI Proxy API can automatically:

Switch Project: Try another project/credential for the same provider
Switch Preview Model: Fall back to preview/alternative model variants

These settings allow uninterrupted service when quotas are hit.

Get Project Switching

GET

endpoint

/v0/management/quota-exceeded/switch-project

Returns whether automatic project switching is enabled.

Request

curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

Response

switch-project

boolean

Whether to automatically switch to another project when quota is exceeded

{
  "switch-project": true
}

Update Project Switching

PUT

endpoint

/v0/management/quota-exceeded/switch-project

PATCH

endpoint

/v0/management/quota-exceeded/switch-project

Enable or disable automatic project switching.

Request

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

Request Body

{
  "value": true
}

Response

status

string

Status of update operation

{
  "status": "ok"
}

Get Preview Model Switching

GET

endpoint

/v0/management/quota-exceeded/switch-preview-model

Returns whether automatic preview model switching is enabled.

Request

curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Response

switch-preview-model

boolean

Whether to automatically switch to preview model when quota is exceeded

{
  "switch-preview-model": true
}

Update Preview Model Switching

PUT

endpoint

/v0/management/quota-exceeded/switch-preview-model

PATCH

endpoint

/v0/management/quota-exceeded/switch-preview-model

Enable or disable automatic preview model switching.

Request

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Request Body

{
  "value": false
}

Response

{
  "status": "ok"
}

Configuration File

These settings correspond to the quota-exceeded section in config.yaml:

quota-exceeded:
  # Automatically switch to another project when quota exceeded
  switch-project: true
  
  # Automatically switch to preview model when quota exceeded
  switch-preview-model: true

How Project Switching Works

When enabled and a quota exceeded error occurs:

Request fails with quota error from provider
CLI Proxy API identifies other credentials for the same provider
Request is retried with the next available credential
Process continues until success or all credentials exhausted

Example Scenario

Configuration:

gemini-api-key:
  - api-key: "AIzaSy...01"  # Project A
  - api-key: "AIzaSy...02"  # Project B
  - api-key: "AIzaSy...03"  # Project C

quota-exceeded:
  switch-project: true

Flow:

Request uses Project A → Quota exceeded
Automatically retry with Project B → Success
Client receives response without error

How Preview Model Switching Works

When enabled and a quota exceeded error occurs:

Request fails with quota error for specific model
CLI Proxy API checks for preview/alternative variants:
- gemini-2.5-pro → gemini-2.5-pro-preview
- gemini-3-pro → gemini-3-pro-preview
Request is retried with preview model
Original model name is restored in response

Example Scenario

Request:

{
  "model": "gemini-2.5-pro",
  "messages": [...]  
}

Flow:

Request for gemini-2.5-pro → Quota exceeded
Automatically retry with gemini-2.5-pro-preview → Success
Response shows "model": "gemini-2.5-pro" (original)

Combined Behavior

Both settings can be enabled simultaneously for maximum availability:

# Enable both features
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": true}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Retry Order:

Try Project A with model-name
Try Project A with model-name-preview (if enabled)
Try Project B with model-name (if enabled)
Try Project B with model-name-preview (if both enabled)
Continue until success or all options exhausted

Disable All Quota Handling

To return quota errors immediately to clients:

# Disable project switching
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-project

# Disable preview model switching  
curl -X PUT \
  -H "X-Management-Key: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"value": false}' \
  http://localhost:8317/v0/management/quota-exceeded/switch-preview-model

Get Current Settings

Retrieve both settings in a single call using the main config endpoint:

curl -H "X-Management-Key: YOUR_SECRET" \
  http://localhost:8317/v0/management/config | jq '."quota-exceeded"'

{
  "switch-project": true,
  "switch-preview-model": true
}

Use Cases

High Availability Setup

Enable both features for maximum uptime:

quota-exceeded:
  switch-project: true
  switch-preview-model: true

Best for production environments where uninterrupted service is critical.

Strict Quota Monitoring

Disable both features to track quota usage:

quota-exceeded:
  switch-project: false
  switch-preview-model: false

Best for development/testing when you need to know exactly when quotas are hit.

Project-Level Failover Only

Enable project switching but disable preview models:

quota-exceeded:
  switch-project: true
  switch-preview-model: false

Best when you want failover between accounts but prefer explicit model selection.

Model-Level Failover Only

Enable preview switching but disable project switching:

quota-exceeded:
  switch-project: false
  switch-preview-model: true

Best when you have a single account but want automatic fallback to preview models. Quota handling works alongside:

Request Retry (/v0/management/request-retry) - Number of retry attempts
Max Retry Interval (/v0/management/max-retry-interval) - Max wait before retry
Routing Strategy (/v0/management/routing/strategy) - How credentials are selected

See Configuration Endpoints for details.

Error Responses

Invalid Value

{
  "error": "invalid body"
}

Returned when request body is malformed or missing value field.

Persistence Failure

{
  "error": "failed to save config: permission denied"
}

Returned when config file cannot be written.

Next Steps

Configuration Endpoints - Configure retry and routing behavior
OAuth Endpoints - Manage provider authentication
Log Endpoints - Monitor quota exceeded events

Overview

OpenAI Compatible

Management API

Quota Management Endpoints

Quota Exceeded Behavior

Get Project Switching

Request

Response

Update Project Switching

Request

Request Body

Response

Get Preview Model Switching

Request

Response

Update Preview Model Switching

Request

Request Body

Response

Configuration File

How Project Switching Works

Example Scenario

How Preview Model Switching Works

Example Scenario

Combined Behavior

Disable All Quota Handling

Get Current Settings

Use Cases

High Availability Setup

Strict Quota Monitoring

Project-Level Failover Only

Model-Level Failover Only

Error Responses

Invalid Value

Persistence Failure

Next Steps

Build docs developers (and LLMs) love

Overview

OpenAI Compatible

Management API

​Quota Exceeded Behavior

​Get Project Switching

​Request

​Response

​Update Project Switching

​Request

​Request Body

​Response

​Get Preview Model Switching

​Request

​Response

​Update Preview Model Switching

​Request

​Request Body

​Response

​Configuration File

​How Project Switching Works

​Example Scenario

​How Preview Model Switching Works

​Example Scenario

​Combined Behavior

​Disable All Quota Handling

​Get Current Settings

​Use Cases

​High Availability Setup

​Strict Quota Monitoring

​Project-Level Failover Only

​Model-Level Failover Only

​Related Configuration

​Error Responses

​Invalid Value

​Persistence Failure

​Next Steps

Build docs developers (and LLMs) love

Quota Exceeded Behavior

Get Project Switching

Request

Response

Update Project Switching

Request

Request Body

Response

Get Preview Model Switching

Request

Response

Update Preview Model Switching

Request

Request Body

Response

Configuration File

How Project Switching Works

Example Scenario

How Preview Model Switching Works

Example Scenario

Combined Behavior

Disable All Quota Handling

Get Current Settings

Use Cases

High Availability Setup

Strict Quota Monitoring

Project-Level Failover Only

Model-Level Failover Only

Related Configuration

Error Responses

Invalid Value

Persistence Failure

Next Steps