Introduction
Codex-LB provides a dual-interface API that supports both OpenAI-compatible endpoints and Codex-specific endpoints. The API enables load-balanced access to ChatGPT models through multiple authenticated accounts, with built-in rate limiting, usage tracking, and model routing capabilities.Base URLs
Codex-LB exposes endpoints under two main URL paths:OpenAI-Compatible API (v1)
/v1/chat/completions- Chat completions (streaming and non-streaming)/v1/responses- Responses endpoint with streaming support/v1/responses/compact- Compact responses format/v1/models- List available models/v1/audio/transcriptions- Audio transcription
Codex Backend API
/backend-api/codex/responses- Streaming responses/backend-api/codex/responses/compact- Compact responses/backend-api/codex/models- List available models/backend-api/transcribe- Audio transcription
Usage API
API Version
The API version is reflected in the URL path. The current version is:- v1 - OpenAI-compatible endpoints
- backend-api - Codex-specific endpoints (unversioned)
0.1.0 (defined in app/main.py:59).
Upstream Configuration
Codex-LB proxies requests to OpenAI’s backend:- Upstream Base URL:
https://chatgpt.com/backend-api(configurable viaCODEX_LB_upstream_base_url) - Auth Base URL:
https://auth.openai.com(configurable viaCODEX_LB_auth_base_url) - Connect Timeout: 30 seconds (configurable via
CODEX_LB_upstream_connect_timeout_seconds) - Stream Idle Timeout: 300 seconds (configurable via
CODEX_LB_stream_idle_timeout_seconds)
Response Format
All API responses follow the OpenAI error envelope format, ensuring compatibility with existing OpenAI SDK clients.Success Response
Successful responses return appropriate status codes (200, 201, etc.) with JSON or Server-Sent Events (SSE) content depending on the endpoint.Error Response
Errors follow the OpenAI error envelope format:Rate Limiting
All API endpoints include rate limit information in response headers:/api/codex/usage endpoint, which returns:
- Current usage across all limits
- Reset timestamps for each limit window
- Per-model usage breakdown
- Request-based limits - Maximum number of requests per time window
- Token-based limits - Input, output, or total token limits
- Cost-based limits - Maximum spend in USD per time window
- Time windows - Daily, weekly, or monthly limits
- Model-specific limits - Per-model or global limits
Model Support
The API automatically fetches and maintains an up-to-date model registry from the upstream service:- Model Registry Enabled: Yes (configurable via
CODEX_LB_model_registry_enabled) - Refresh Interval: 300 seconds (5 minutes)
- Client Version: 0.101.0 (used for model compatibility checking)
/v1/models or /backend-api/codex/models endpoint to retrieve the current list of available models.
Transcription Model
For audio transcription endpoints, only one model is supported:- Model:
gpt-4o-transcribe
Request/Response Flow
- Authentication - API key validated (if required)
- Model Access Check - Verify API key has access to requested model
- Rate Limit Enforcement - Check and reserve usage quota
- Proxy Request - Forward request to upstream OpenAI service
- Stream Response - Return SSE stream or collect full response
- Usage Recording - Finalize usage reservation with actual token counts
Next Steps
- Authentication - Learn about API key authentication
- Error Handling - Understand error codes and formats
- Chat Completions - Start using the chat completions endpoint