Generate API

POST /api/generate

Generate a text completion from a single prompt. This is a simpler alternative to the chat endpoint for single-turn text generation.

Request Body

model

string

required

The name of the model to use. Must be one of the available models from your models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).

prompt

string

required

The text prompt to generate a completion for.

stream

boolean

default:"false"

Enable streaming responses. When true, responses are returned as newline-delimited JSON chunks.

context

array

default:"[]"

Optional context array for maintaining state across requests.

images

array

Optional array of images in base64 format or HTTP(S) URLs for vision-capable models. Only JPEG format is supported.

options

object

Optional parameters to control generation behavior.

options.temperature

number

Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.

options.top_p

number

Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.

options.num_predict

number

Maximum number of tokens to generate.

Response

Non-Streaming Response

model

string

The name of the model used.

created_at

string

ISO 8601 timestamp of when the response was created.

response

string

The generated text completion.

done

boolean

Always true for non-streaming responses.

context

array

Context array that can be passed to subsequent requests.

reasoning

string

Optional reasoning chain (for models like DeepSeek R1 that support reasoning).

Streaming Response

When stream: true, the response is sent as newline-delimited JSON chunks:

model

string

The name of the model used.

created_at

string

ISO 8601 timestamp for this chunk.

response

string

The text delta for this chunk (empty string in final chunk).

done

boolean

false for intermediate chunks, true for the final chunk.

context

array

Context array (only present in final chunk).

reasoning

string

Optional reasoning chain (only present in final chunk for supported models).

Examples

curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "prompt": "Explain quantum computing in simple terms"
  }'

Response Examples

{
  "model": "gpt-4o-mini",
  "created_at": "2026-03-11T10:30:00.000Z",
  "response": "Quantum computing is a revolutionary approach to computation that uses quantum mechanics...",
  "done": true,
  "context": []
}

The generate endpoint internally converts your prompt into a single user message and calls the same generation logic as the chat endpoint. For multi-turn conversations, use the /api/chat endpoint instead.

Endpoints

POST /api/generate

Request Body

Response

Non-Streaming Response

Streaming Response

Examples

Response Examples

Build docs developers (and LLMs) love

Endpoints

​POST /api/generate

​Request Body

​Response

​Non-Streaming Response

​Streaming Response

​Examples

​Response Examples

Build docs developers (and LLMs) love

POST /api/generate

Request Body

Response

Non-Streaming Response

Streaming Response

Examples

Response Examples