POST /v1/chat/completions
Create a model response for the given conversation history. This endpoint is fully compatible with the OpenAI Chat Completions API and supports both streaming and non-streaming responses.Request Body
The model to use for completion. Examples:
gemini-2.5-pro, claude-sonnet-4, gpt-4o, or any model supported by your configured providers.An array of message objects representing the conversation history.Each message object contains:
role(string, required): One ofsystem,user,assistant, ortoolcontent(string or array, required): The message content. Can be a string or an array of content parts for multimodal inputsname(string, optional): The name of the message authortool_calls(array, optional): Tool calls made by the assistanttool_call_id(string, optional): The ID of the tool call this message is responding to (for tool messages)
If set to
true, the server will send partial message deltas as Server-Sent Events (SSE). If false, the server will wait until the generation is complete before sending the full response.Sampling temperature between 0 and 2. Higher values make output more random, lower values make it more deterministic.
Nucleus sampling parameter. The model considers the results of tokens with top_p probability mass.
Number of chat completion choices to generate. Note: Not all providers support values greater than 1.
The maximum number of tokens to generate. If not specified, the model will generate until it reaches a natural stopping point.
Up to 4 sequences where the API will stop generating further tokens.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text.
A list of tools (functions) the model may call. Each tool object contains:
type(string): Must befunctionfunction(object): Function definition withname,description, andparameters
Controls which (if any) function is called by the model. Options:
none: Model will not call any functionauto: Model can choose to call a function or generate a messagerequired: Model must call one or more functions- Object with specific function name to force that function
Controls the level of reasoning for models that support extended thinking. Options:
none: No extended reasoninglow: Minimal reasoning effortmedium: Moderate reasoning efforthigh: Maximum reasoning effortauto: Let the model decide
thinkingConfig).Response modalities for multimodal models. Supported values:
text, image.Example: ["text", "image"] for models that can generate both text and images.Response Format
Non-Streaming Response
A unique identifier for the chat completion.
Always
chat.completion.Unix timestamp (in seconds) of when the completion was created.
The model used for the completion.
Array of completion choices. Each choice contains:
The index of this choice in the array.
Why the model stopped generating tokens. Possible values:
stop: Natural completionlength: Maximum token limit reachedtool_calls: Model called a functioncontent_filter: Content filtered by safety systems
Streaming Response
Whenstream: true, the server sends chunks as Server-Sent Events (SSE). Each chunk follows this format:
delta object with the incremental changes to the message. The stream ends with a data: [DONE] message.
Examples
Basic Chat Completion
Streaming Response
Function Calling
With Reasoning Effort
Implementation Details
The/v1/chat/completions endpoint is implemented in sdk/api/handlers/openai/openai_handlers.go. Key behaviors:
- Provider Translation: Requests are automatically translated to the target provider’s format (Gemini, Claude, etc.)
- Function Calling: Tool calls are preserved across provider translations
- Streaming: SSE streaming uses chunked transfer encoding with immediate flushing
- Auto-conversion: Some clients send OpenAI Responses-format payloads to this endpoint; these are automatically converted to Chat Completions format
If the request doesn’t include a
messages field but has input or instructions, it will be automatically treated as an OpenAI Responses-format request and converted.Error Responses
All errors follow the OpenAI error format:400 Bad Request: Invalid request parameters401 Unauthorized: Missing or invalid API key429 Too Many Requests: Rate limit exceeded500 Internal Server Error: Server-side error503 Service Unavailable: Provider temporarily unavailable