Skip to main content

Overview

The ChatRequest schema defines the structure for sending chat completion requests to the LLM Gateway. It includes message history, model configuration, and generation parameters.

Schema Definition

messages
array
required
Array of message objects representing the conversation history.
model
string
Optional model identifier. If not specified, the gateway will select an appropriate model based on the request.
model_hint
string
Optional hint to guide model selection. Can be used to specify preferences like model family or capabilities.
max_tokens
integer
default:"512"
Maximum number of tokens to generate in the completion. Controls the length of the response.
temperature
float
default:"0.7"
Sampling temperature between 0 and 2. Higher values (e.g., 1.0) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic.
stream
boolean
default:"false"
Whether to stream the response. When set to true, tokens are sent as they are generated.

Example Request

{
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "model_hint": "online",
  "max_tokens": 512,
  "temperature": 0.7,
  "stream": false
}

Multi-Turn Conversation Example

{
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    {
      "role": "user",
      "content": "What is its population?"
    }
  ],
  "model_hint": "fast",
  "temperature": 0.7
}

Validation Rules

  • messages array must contain at least one message
  • Each message must have both role and content fields
  • temperature must be between 0 and 2 (if provided)
  • max_tokens must be a positive integer (if provided)
  • stream must be a boolean value

Build docs developers (and LLMs) love