Skip to main content
POST
/
api
/
v1
/
chat
Chat Completions
curl --request POST \
  --url https://api.example.com/api/v1/chat \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <x-api-key>' \
  --data '
{
  "messages": [
    {
      "role": "<string>",
      "content": "<string>"
    }
  ],
  "model": "<string>",
  "model_hint": "<string>",
  "max_tokens": 123,
  "temperature": 123,
  "stream": true
}
'
{
  "detail": "Invalid or missing API Key"
}
The Chat endpoint is the primary interface for processing LLM chat completions. It accepts messages and model configurations, routes requests to the appropriate provider, and returns standardized responses.

Authentication

This endpoint requires authentication via the X-API-Key header.
X-API-Key
string
required
Your API key for authentication. Requests without a valid API key will receive a 401 error.

Rate Limiting

This endpoint is rate-limited using a token bucket algorithm backed by Redis. Rate limits are enforced per API key. When the rate limit is exceeded, the endpoint returns a 429 status code.
  • Capacity: Configurable via RATE_LIMITER_CAPACITY
  • Refill Rate: Configurable via RATE_LIMITER_REFILL_RATE

Request Body

messages
array
required
An array of message objects representing the conversation history.
model
string
The specific model identifier to use for the completion. If not provided, the gateway will select an appropriate model.
model_hint
string
A hint to help the gateway select an appropriate model or provider.
max_tokens
integer
default:"512"
The maximum number of tokens to generate in the completion.
temperature
number
default:"0.7"
Controls randomness in the output. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.2) make it more deterministic.
stream
boolean
default:"false"
Whether to stream the response. Currently defaults to false.

Response

id
string
required
A unique identifier for this completion request.
provider
string
required
The LLM provider that handled this request (e.g., “gemini”, “ollama”).
content
string
required
The generated completion text from the model.
usage
object
required
Token usage statistics for this request.

Example Request

curl -X POST http://localhost:8000/api/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key_here" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "model_hint": "online",
    "max_tokens": 512,
    "temperature": 0.7
  }'

Example Response

{
  "id": "abc123-def456-789",
  "provider": "gemini",
  "content": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and iconic landmarks like the Eiffel Tower.",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Error Responses

{
  "detail": "Invalid or missing API Key"
}
{
  "detail": "Too many requests. Please wait before trying again."
}
{
  "detail": [
    {
      "loc": ["body", "messages"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

Build docs developers (and LLMs) love