Skip to main content
Generate a response for a given prompt with a provided model. This is the core text generation endpoint that supports completions, image generation, and multimodal inputs.

Request

Endpoint

POST /api/generate

Request Body

model
string
required
Model name to use for generation (e.g., llama3.2, mistral, stable-diffusion)
prompt
string
required
The prompt to generate a response for
suffix
string
Text that comes after the inserted text (for fill-in-the-middle models)
system
string
System message to override the model’s default system prompt
template
string
Custom prompt template to override the model’s default
context
array
Context from a previous generation to maintain conversational memory (deprecated)
stream
boolean
default:"true"
Enable streaming of response chunks. Set to false to wait for complete response.
raw
boolean
default:"false"
If true, no formatting will be applied to the prompt
format
string | object
Format to return response in. Use "json" for JSON mode or provide a JSON schema.
images
array
Array of base64-encoded images for multimodal models
options
object
Model-specific options for customizing generation
keep_alive
string | number
default:"5m"
Duration to keep model in memory (e.g., "5m", "1h", -1 for indefinite)
think
boolean | string
Enable thinking mode for reasoning models (true/false or "high", "medium", "low")
truncate
boolean
default:"true"
Truncate prompt if it exceeds context length
shift
boolean
default:"true"
Shift context when hitting context length instead of erroring
logprobs
boolean
default:"false"
Return log probabilities for output tokens
top_logprobs
integer
default:"0"
Number of top tokens to return at each position (0-20, requires logprobs: true)

Image Generation Parameters

The following parameters are only used with image generation models:
width
integer
Width of generated image in pixels (max 4096)
height
integer
Height of generated image in pixels (max 4096)
steps
integer
Number of diffusion steps for image generation

Response

Response Fields

model
string
The model used for generation
created_at
string
Timestamp in ISO 8601 format
response
string
The generated text response
thinking
string
Reasoning text when thinking mode is enabled
done
boolean
Whether generation is complete
done_reason
string
Reason for completion: stop, length, load, unload
context
array
Token encoding of the conversation for maintaining memory (deprecated)
total_duration
integer
Total duration in nanoseconds
load_duration
integer
Model loading time in nanoseconds
prompt_eval_count
integer
Number of tokens in the prompt
prompt_eval_duration
integer
Time evaluating the prompt in nanoseconds
eval_count
integer
Number of tokens generated
eval_duration
integer
Time generating response in nanoseconds
logprobs
array
Log probability information (when enabled)
image
string
Base64-encoded generated image (image generation models only)
completed
integer
Number of completed steps during image generation
total
integer
Total steps for image generation

Examples

Basic Text Generation

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What is the capital of France?",
  "stream": false
}'

Example Response

{
  "model": "llama3.2",
  "created_at": "2024-02-24T12:34:56.789Z",
  "response": "The capital of France is Paris.",
  "done": true,
  "done_reason": "stop",
  "context": [128000, 3923, 374, 279, ...],
  "total_duration": 2345678901,
  "load_duration": 123456789,
  "prompt_eval_count": 8,
  "prompt_eval_duration": 234567890,
  "eval_count": 8,
  "eval_duration": 1987654321
}

Streaming Generation

import requests
import json

response = requests.post(
    'http://localhost:11434/api/generate',
    json={'model': 'llama3.2', 'prompt': 'Write a haiku'},
    stream=True
)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk['response'], end='', flush=True)

JSON Mode

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "List 3 colors",
  "format": "json",
  "stream": false
}'

Multimodal Input (Vision Models)

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2-vision",
  "prompt": "What is in this image?",
  "images": ["<base64-encoded-image>"],
  "stream": false
}'

Error Responses

error
string
Description of the error

Common Errors

  • 400 Bad Request: Invalid model name, parameters, or image dimensions
  • 404 Not Found: Model not found
  • 500 Internal Server Error: Generation error
To unload a model from memory, send an empty prompt with keep_alive: 0.

Build docs developers (and LLMs) love