Skip to main content

Overview

The ChatResponse schema defines the structure of successful chat completion responses from the LLM Gateway. It includes the generated content, provider information, and token usage statistics.

Schema Definition

id
string
required
Unique identifier for the chat completion response.
provider
string
required
The LLM provider that fulfilled the request (e.g., "gemini", "ollama").
content
string
required
The generated completion text from the model.
usage
object
required
Token usage statistics for the completion.

Example Response

{
  "id": "8f7d9e2a-1b3c-4d5e-abc1-23456789",
  "provider": "gemini",
  "content": "The capital of France is Paris. It is located in the north-central part of the country.",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Provider Examples

The provider field indicates which LLM backend processed your request:
{
  "id": "abc123-def456",
  "provider": "gemini",
  "content": "Response from Google Gemini model",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Understanding Token Usage

The usage object helps you track and optimize your API consumption:
  • prompt_tokens: Represents the size of your input (messages sent)
  • completion_tokens: Represents the size of the model’s response
  • total_tokens: Combined count used for billing and rate limiting
Monitor total_tokens to estimate costs and optimize your prompts for efficiency.

Nested Schemas

Usage Schema

The Usage object is a nested schema that provides detailed token consumption metrics:
Python
class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
All fields are required integers representing token counts.

Build docs developers (and LLMs) love