Chat Completions

The Chat endpoint is the primary interface for processing LLM chat completions. It accepts messages and model configurations, routes requests to the appropriate provider, and returns standardized responses.

Authentication

This endpoint requires authentication via the X-API-Key header.

X-API-Key

string

required

Your API key for authentication. Requests without a valid API key will receive a 401 error.

Rate Limiting

This endpoint is rate-limited using a token bucket algorithm backed by Redis. Rate limits are enforced per API key. When the rate limit is exceeded, the endpoint returns a 429 status code.

Capacity: Configurable via RATE_LIMITER_CAPACITY
Refill Rate: Configurable via RATE_LIMITER_REFILL_RATE

Request Body

messages

array

required

An array of message objects representing the conversation history.

Show Message Object

role

string

required

The role of the message sender. Typical values: "user", "assistant", "system"

content

string

required

The content of the message.

model

string

The specific model identifier to use for the completion. If not provided, the gateway will select an appropriate model.

model_hint

string

A hint to help the gateway select an appropriate model or provider.

max_tokens

integer

default:"512"

The maximum number of tokens to generate in the completion.

temperature

number

default:"0.7"

Controls randomness in the output. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.2) make it more deterministic.

stream

boolean

default:"false"

Whether to stream the response. Currently defaults to false.

Response

string

required

A unique identifier for this completion request.

provider

string

required

The LLM provider that handled this request (e.g., “gemini”, “ollama”).

content

string

required

The generated completion text from the model.

usage

object

required

Token usage statistics for this request.

Show Usage Object

prompt_tokens

integer

required

Number of tokens in the prompt.

completion_tokens

integer

required

Number of tokens in the generated completion.

total_tokens

integer

required

Total number of tokens used (prompt + completion).

Example Request

curl -X POST http://localhost:8000/api/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key_here" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "model_hint": "online",
    "max_tokens": 512,
    "temperature": 0.7
  }'

Example Response

{
  "id": "abc123-def456-789",
  "provider": "gemini",
  "content": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and iconic landmarks like the Eiffel Tower.",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Error Responses

{
  "detail": "Invalid or missing API Key"
}

{
  "detail": "Too many requests. Please wait before trying again."
}

{
  "detail": [
    {
      "loc": ["body", "messages"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

Endpoints

Schemas

Authentication

Rate Limiting

Request Body

Response

Example Request

Example Response

Error Responses

Build docs developers (and LLMs) love

Endpoints

Schemas

​Authentication

​Rate Limiting

​Request Body

​Response

​Example Request

​Example Response

​Error Responses

Build docs developers (and LLMs) love

Authentication

Rate Limiting

Request Body

Response

Example Request

Example Response

Error Responses