API Overview

Introduction

The CLI Proxy API provides a unified interface for accessing multiple AI model providers through OpenAI-compatible, Claude-compatible, and Gemini-compatible endpoints. The API supports both streaming and non-streaming responses, request authentication, and dynamic model routing.

Base URL

The API server runs on a configurable host and port. The default configuration is:

http://localhost:8317

You can customize the host and port in your config.yaml:

config.yaml

host: ""  # Empty string binds to all interfaces (IPv4 + IPv6)
port: 8317

To restrict access to localhost only, set host: "127.0.0.1" or host: "localhost"

HTTPS/TLS Support

The API supports HTTPS with TLS certificates:

config.yaml

tls:
  enable: true
  cert: "/path/to/cert.pem"
  key: "/path/to/key.pem"

When TLS is enabled, the base URL becomes:

https://localhost:8317

API Versioning

The CLI Proxy API uses URL path versioning with multiple API versions:

OpenAI-Compatible API (v1)

All OpenAI-compatible endpoints are prefixed with /v1:

POST /v1/chat/completions
POST /v1/completions
GET  /v1/models
POST /v1/messages
POST /v1/messages/count_tokens
GET  /v1/responses
POST /v1/responses
POST /v1/responses/compact

Gemini-Compatible API (v1beta)

Gemini-compatible endpoints use the /v1beta prefix:

GET  /v1beta/models
POST /v1beta/models/{model}:generateContent
POST /v1beta/models/{model}:streamGenerateContent
POST /v1beta/models/{model}:countTokens
GET  /v1beta/models/{model}

Management API (v0)

Management and administrative endpoints use /v0/management:

GET  /v0/management/usage
GET  /v0/management/config
PUT  /v0/management/config.yaml
GET  /v0/management/api-keys
POST /v0/management/api-call

Management API endpoints require authentication with a secret key. See the Authentication page for details.

Endpoint Categories

The API is organized into the following functional categories:

1. Chat & Completions

OpenAI Format

POST /v1/chat/completions - OpenAI-compatible chat completions
POST /v1/completions - OpenAI-compatible text completions
POST /v1/responses - OpenAI Responses API format
POST /v1/responses/compact - Compact response format
GET /v1/responses - WebSocket for Responses API

Claude Format

POST /v1/messages - Claude-compatible message API
POST /v1/messages/count_tokens - Count tokens for Claude requests

Gemini Format

POST /v1beta/models/{model}:generateContent - Generate content (non-streaming)
POST /v1beta/models/{model}:streamGenerateContent - Generate content (streaming)
POST /v1beta/models/{model}:countTokens - Count tokens

2. Model Listing

GET /v1/models - List OpenAI-compatible models (routes based on User-Agent)
GET /v1beta/models - List all Gemini-compatible models
GET /v1beta/models/{model} - Get specific model information

3. OAuth Callbacks

OAuth provider callback endpoints for authentication flows:

GET /anthropic/callback - Claude/Anthropic OAuth callback
GET /codex/callback - Codex OAuth callback
GET /google/callback - Google/Gemini OAuth callback
GET /iflow/callback - iFlow OAuth callback
GET /antigravity/callback - Antigravity OAuth callback

4. Management & Configuration

Comprehensive management endpoints at /v0/management/*:

Usage & Statistics: Get usage data, export/import statistics
Configuration: View and update server configuration
API Keys: Manage API keys for various providers
Logs: Access request logs and error logs
Authentication: OAuth flows and credential management

5. Control Panel

GET /management.html - Web-based management control panel

Response Formats

The API supports multiple response formats:

JSON Responses

All non-streaming endpoints return JSON:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ]
}

Server-Sent Events (SSE)

Streaming endpoints use Server-Sent Events:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}

data: [DONE]

WebSocket

The Responses API also supports WebSocket connections:

GET /v1/responses
Upgrade: websocket
Connection: Upgrade

Error Responses

All error responses follow this format:

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Common HTTP status codes:

400 Bad Request - Invalid request parameters
401 Unauthorized - Missing or invalid API key
404 Not Found - Endpoint or resource not found
429 Too Many Requests - Rate limit exceeded
500 Internal Server Error - Server error
502 Bad Gateway - Upstream service error
503 Service Unavailable - Service temporarily unavailable

CORS Support

The API includes CORS headers on all responses:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE, OPTIONS
Access-Control-Allow-Headers: *

Rate Limiting & Retries

The API includes built-in retry logic for failed requests:

config.yaml

request-retry: 3  # Number of retries
max-retry-interval: 30  # Max wait time in seconds
max-retry-credentials: 0  # Max credentials to try (0 = try all)

Retries occur for HTTP status codes: 403, 408, 500, 502, 503, 504

Next Steps

Authentication

Learn how to authenticate API requests

Chat Completions

Make OpenAI-compatible chat requests

Gemini API

Use Gemini-compatible endpoints

Management API

Configure and manage the API server

Overview

OpenAI Compatible

Management API

Introduction

Base URL

HTTPS/TLS Support

API Versioning

OpenAI-Compatible API (v1)

Gemini-Compatible API (v1beta)

Management API (v0)

Endpoint Categories

1. Chat & Completions

2. Model Listing

3. OAuth Callbacks

4. Management & Configuration

5. Control Panel

Response Formats

JSON Responses

Server-Sent Events (SSE)

WebSocket

Error Responses

CORS Support

Rate Limiting & Retries

Next Steps

Authentication

Chat Completions

Gemini API

Management API

Build docs developers (and LLMs) love

Overview

OpenAI Compatible

Management API

​Introduction

​Base URL

​HTTPS/TLS Support

​API Versioning

​OpenAI-Compatible API (v1)

​Gemini-Compatible API (v1beta)

​Management API (v0)

​Endpoint Categories

​1. Chat & Completions

​2. Model Listing

​3. OAuth Callbacks

​4. Management & Configuration

​5. Control Panel

​Response Formats

​JSON Responses

​Server-Sent Events (SSE)

​WebSocket

​Error Responses

​CORS Support

​Rate Limiting & Retries

​Next Steps

Authentication

Chat Completions

Gemini API

Management API

Build docs developers (and LLMs) love

Introduction

Base URL

HTTPS/TLS Support

API Versioning

OpenAI-Compatible API (v1)

Gemini-Compatible API (v1beta)

Management API (v0)

Endpoint Categories

1. Chat & Completions

2. Model Listing

3. OAuth Callbacks

4. Management & Configuration

5. Control Panel

Response Formats

JSON Responses

Server-Sent Events (SSE)

WebSocket

Error Responses

CORS Support

Rate Limiting & Retries

Next Steps