Skip to main content

POST /api/chat

Generate a response based on a conversation history. This endpoint supports multi-turn conversations, streaming responses, and vision capabilities (images).

Request Body

model
string
required
The name of the model to use. Must be one of the available models from your models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).
messages
array
required
Array of message objects representing the conversation history. Each message should have a role and content.
messages[].role
string
required
The role of the message author. Either user or assistant.
messages[].content
string | array
required
The content of the message. Can be a string for text-only messages, or an array of content blocks for messages with images.
messages[].images
array
Optional array of images in base64 format or HTTP(S) URLs. Used for vision-capable models.
stream
boolean
default:"false"
Enable streaming responses. When true, responses are returned as newline-delimited JSON chunks.
context
array
default:"[]"
Optional context array for maintaining conversation state across requests.
options
object
Optional parameters to control generation behavior.
options.temperature
number
Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.
options.top_p
number
Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.
options.num_predict
number
Maximum number of tokens to generate.

Response

Non-Streaming Response

model
string
The name of the model used.
created_at
string
ISO 8601 timestamp of when the response was created.
message
object
The generated message.
message.role
string
Always assistant.
message.content
string
The generated text content.
message.reasoning
string
Optional reasoning chain (for models like DeepSeek R1 that support reasoning).
done
boolean
Always true for non-streaming responses.
context
array
Context array that can be passed to subsequent requests.

Streaming Response

When stream: true, the response is sent as newline-delimited JSON chunks:
model
string
The name of the model used.
created_at
string
ISO 8601 timestamp for this chunk.
message
object
The message chunk.
message.role
string
Always assistant.
message.content
string
The text delta for this chunk (empty string in final chunk).
done
boolean
false for intermediate chunks, true for the final chunk.
context
array
Context array (only present in final chunk).

Examples

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ]
  }'

Response Examples

{
  "model": "gpt-4o-mini",
  "created_at": "2026-03-11T10:30:00.000Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue because of a phenomenon called Rayleigh scattering..."
  },
  "done": true,
  "context": []
}
Vision support requires JPEG images provided as base64 strings or HTTP(S) URLs. The proxy automatically converts them to data URLs for vision-capable models.

Build docs developers (and LLMs) love