POST /api/chat
Generate a response based on a conversation history. This endpoint supports multi-turn conversations, streaming responses, and vision capabilities (images).Request Body
The name of the model to use. Must be one of the available models from your
models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).Array of message objects representing the conversation history. Each message should have a
role and content.The role of the message author. Either
user or assistant.The content of the message. Can be a string for text-only messages, or an array of content blocks for messages with images.
Optional array of images in base64 format or HTTP(S) URLs. Used for vision-capable models.
Enable streaming responses. When
true, responses are returned as newline-delimited JSON chunks.Optional context array for maintaining conversation state across requests.
Optional parameters to control generation behavior.
Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.
Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.
Maximum number of tokens to generate.
Response
Non-Streaming Response
The name of the model used.
ISO 8601 timestamp of when the response was created.
Always
true for non-streaming responses.Context array that can be passed to subsequent requests.
Streaming Response
Whenstream: true, the response is sent as newline-delimited JSON chunks:
The name of the model used.
ISO 8601 timestamp for this chunk.
false for intermediate chunks, true for the final chunk.Context array (only present in final chunk).
Examples
Response Examples
Vision support requires JPEG images provided as base64 strings or HTTP(S) URLs. The proxy automatically converts them to data URLs for vision-capable models.
