Skip to main content

POST /api/generate

Generate a text completion from a single prompt. This is a simpler alternative to the chat endpoint for single-turn text generation.

Request Body

model
string
required
The name of the model to use. Must be one of the available models from your models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).
prompt
string
required
The text prompt to generate a completion for.
stream
boolean
default:"false"
Enable streaming responses. When true, responses are returned as newline-delimited JSON chunks.
context
array
default:"[]"
Optional context array for maintaining state across requests.
images
array
Optional array of images in base64 format or HTTP(S) URLs for vision-capable models. Only JPEG format is supported.
options
object
Optional parameters to control generation behavior.
options.temperature
number
Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.
options.top_p
number
Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.
options.num_predict
number
Maximum number of tokens to generate.

Response

Non-Streaming Response

model
string
The name of the model used.
created_at
string
ISO 8601 timestamp of when the response was created.
response
string
The generated text completion.
done
boolean
Always true for non-streaming responses.
context
array
Context array that can be passed to subsequent requests.
reasoning
string
Optional reasoning chain (for models like DeepSeek R1 that support reasoning).

Streaming Response

When stream: true, the response is sent as newline-delimited JSON chunks:
model
string
The name of the model used.
created_at
string
ISO 8601 timestamp for this chunk.
response
string
The text delta for this chunk (empty string in final chunk).
done
boolean
false for intermediate chunks, true for the final chunk.
context
array
Context array (only present in final chunk).
reasoning
string
Optional reasoning chain (only present in final chunk for supported models).

Examples

curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "prompt": "Explain quantum computing in simple terms"
  }'

Response Examples

{
  "model": "gpt-4o-mini",
  "created_at": "2026-03-11T10:30:00.000Z",
  "response": "Quantum computing is a revolutionary approach to computation that uses quantum mechanics...",
  "done": true,
  "context": []
}
The generate endpoint internally converts your prompt into a single user message and calls the same generation logic as the chat endpoint. For multi-turn conversations, use the /api/chat endpoint instead.

Build docs developers (and LLMs) love