Skip to main content

Overview

The proxy chat completions endpoint provides OpenAI-compatible chat completions with additional features like authentication, rate limiting, budgets, and centralized logging.

Endpoint

POST {PROXY_BASE_URL}/v1/chat/completions
Alternate routes:
  • POST /chat/completions
  • POST /engines/{model}/chat/completions
  • POST /openai/deployments/{model}/chat/completions

Authentication

Authorization
string
required
Bearer token for authentication.
Authorization: Bearer sk-litellm-xxx...

Request Headers

Content-Type
string
default:"application/json"
Content type of the request body.
x-litellm-team-id
string
Team ID for team-based access control.
x-litellm-metadata
string
JSON stringified metadata for request tracking.
x-litellm-user-id
string
End-user ID for tracking and analytics.
x-litellm-tags
string
Comma-separated tags for request categorization.

Request Body

The request body follows the OpenAI chat completions format:
model
string
required
Model to use for completion.
{"model": "gpt-4"}
messages
array
required
Array of message objects.
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
}
temperature
number
Sampling temperature (0-2).
max_tokens
integer
Maximum tokens to generate.
stream
boolean
default:"false"
Whether to stream the response.
tools
array
Tools available for function calling.
See completion() API for all available parameters.

Response

Success Response (200)

id
string
Unique identifier for the completion.
object
string
Object type (“chat.completion” or “chat.completion.chunk” for streaming).
created
integer
Unix timestamp of creation.
model
string
Model used for the completion.
choices
array
Array of completion choices.
usage
object
Token usage information.

Error Responses

401 Unauthorized
Invalid or missing authentication token.
{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}
429 Too Many Requests
Rate limit exceeded.
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}
400 Bad Request
Invalid request parameters.

Examples

Basic Request

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-litellm-xxx" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Python Request

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Streaming Request

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

With Metadata

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "x-litellm-user-id": "user-123",
        "x-litellm-metadata": '{"environment": "production"}',
        "x-litellm-tags": "tag1,tag2"
    }
)

Function Calling

import openai

client = openai.OpenAI(
    api_key="sk-litellm-xxx",
    base_url="http://localhost:4000"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    tools=tools
)

Proxy-Specific Features

Budget Tracking

The proxy automatically tracks spending against key/team budgets:
# Key will be rejected if budget exceeded
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
# Response includes cost tracking

Rate Limiting

Keys can have TPM (tokens per minute) and RPM (requests per minute) limits:
# Requests are automatically throttled
# 429 error returned if limits exceeded

Model Aliases

Use proxy-defined model aliases:
response = client.chat.completions.create(
    model="gpt-4",  # Can map to specific deployment
    messages=[{"role": "user", "content": "Hello"}]
)

Automatic Retries & Fallbacks

Proxy handles retries and fallbacks automatically:
# If primary deployment fails, proxy tries fallback
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Monitoring & Logging

All requests are logged with:
  • Request/response details
  • Token usage
  • Costs
  • Latency
  • Errors
  • User/team information
  • Custom metadata
Access logs through:
  • Admin UI at /ui
  • Spend tracking endpoints
  • Custom callback integrations

Build docs developers (and LLMs) love