Chat API

POST /api/chat

Generate a response based on a conversation history. This endpoint supports multi-turn conversations, streaming responses, and vision capabilities (images).

Request Body

model

string

required

The name of the model to use. Must be one of the available models from your models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).

messages

array

required

Array of message objects representing the conversation history. Each message should have a role and content.

messages[].role

string

required

The role of the message author. Either user or assistant.

messages[].content

string | array

required

The content of the message. Can be a string for text-only messages, or an array of content blocks for messages with images.

messages[].images

array

Optional array of images in base64 format or HTTP(S) URLs. Used for vision-capable models.

stream

boolean

default:"false"

Enable streaming responses. When true, responses are returned as newline-delimited JSON chunks.

context

array

default:"[]"

Optional context array for maintaining conversation state across requests.

options

object

Optional parameters to control generation behavior.

options.temperature

number

Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.

options.top_p

number

Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.

options.num_predict

number

Maximum number of tokens to generate.

Response

Non-Streaming Response

model

string

The name of the model used.

created_at

string

ISO 8601 timestamp of when the response was created.

message

object

The generated message.

message.role

string

Always assistant.

message.content

string

The generated text content.

message.reasoning

string

Optional reasoning chain (for models like DeepSeek R1 that support reasoning).

done

boolean

Always true for non-streaming responses.

context

array

Context array that can be passed to subsequent requests.

Streaming Response

When stream: true, the response is sent as newline-delimited JSON chunks:

model

string

The name of the model used.

created_at

string

ISO 8601 timestamp for this chunk.

message

object

The message chunk.

message.role

string

Always assistant.

message.content

string

The text delta for this chunk (empty string in final chunk).

done

boolean

false for intermediate chunks, true for the final chunk.

context

array

Context array (only present in final chunk).

Examples

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ]
  }'

Response Examples

{
  "model": "gpt-4o-mini",
  "created_at": "2026-03-11T10:30:00.000Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue because of a phenomenon called Rayleigh scattering..."
  },
  "done": true,
  "context": []
}

Vision support requires JPEG images provided as base64 strings or HTTP(S) URLs. The proxy automatically converts them to data URLs for vision-capable models.

Endpoints

POST /api/chat

Request Body

Response

Non-Streaming Response

Streaming Response

Examples

Response Examples

Build docs developers (and LLMs) love

Endpoints

​POST /api/chat

​Request Body

​Response

​Non-Streaming Response

​Streaming Response

​Examples

​Response Examples

Build docs developers (and LLMs) love

POST /api/chat

Request Body

Response

Non-Streaming Response

Streaming Response

Examples

Response Examples