POST /api/generate

Generate a response for a given prompt with a provided model. This is the core text generation endpoint that supports completions, image generation, and multimodal inputs.

Request

Endpoint

POST /api/generate

Request Body

model

string

required

Model name to use for generation (e.g., llama3.2, mistral, stable-diffusion)

prompt

string

required

The prompt to generate a response for

suffix

string

Text that comes after the inserted text (for fill-in-the-middle models)

system

string

System message to override the model’s default system prompt

template

string

Custom prompt template to override the model’s default

context

array

Context from a previous generation to maintain conversational memory (deprecated)

stream

boolean

default:"true"

Enable streaming of response chunks. Set to false to wait for complete response.

raw

boolean

default:"false"

If true, no formatting will be applied to the prompt

format

string | object

Format to return response in. Use "json" for JSON mode or provide a JSON schema.

images

array

Array of base64-encoded images for multimodal models

options

object

Model-specific options for customizing generation

Show Common Options

temperature

float

default:"0.8"

Controls randomness (0.0 = deterministic, 2.0 = very random)

top_k

integer

default:"40"

Limits next token selection to K most likely tokens

top_p

float

default:"0.9"

Nucleus sampling threshold (0.0 to 1.0)

num_predict

integer

default:"-1"

Maximum tokens to generate (-1 = unlimited)

num_ctx

integer

default:"4096"

Context window size

seed

integer

default:"-1"

Random seed for reproducibility

repeat_penalty

float

default:"1.1"

Penalty for repeating tokens

keep_alive

string | number

default:"5m"

Duration to keep model in memory (e.g., "5m", "1h", -1 for indefinite)

think

boolean | string

Enable thinking mode for reasoning models (true/false or "high", "medium", "low")

truncate

boolean

default:"true"

Truncate prompt if it exceeds context length

shift

boolean

default:"true"

Shift context when hitting context length instead of erroring

logprobs

boolean

default:"false"

Return log probabilities for output tokens

top_logprobs

integer

default:"0"

Number of top tokens to return at each position (0-20, requires logprobs: true)

Image Generation Parameters

The following parameters are only used with image generation models:

width

integer

Width of generated image in pixels (max 4096)

height

integer

Height of generated image in pixels (max 4096)

steps

integer

Number of diffusion steps for image generation

Response

Response Fields

model

string

The model used for generation

created_at

string

Timestamp in ISO 8601 format

response

string

The generated text response

thinking

string

Reasoning text when thinking mode is enabled

done

boolean

Whether generation is complete

done_reason

string

Reason for completion: stop, length, load, unload

context

array

Token encoding of the conversation for maintaining memory (deprecated)

total_duration

integer

Total duration in nanoseconds

load_duration

integer

Model loading time in nanoseconds

prompt_eval_count

integer

Number of tokens in the prompt

prompt_eval_duration

integer

Time evaluating the prompt in nanoseconds

eval_count

integer

Number of tokens generated

eval_duration

integer

Time generating response in nanoseconds

logprobs

array

Log probability information (when enabled)

image

string

Base64-encoded generated image (image generation models only)

completed

integer

Number of completed steps during image generation

total

integer

Total steps for image generation

Examples

Basic Text Generation

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What is the capital of France?",
  "stream": false
}'

Example Response

{
  "model": "llama3.2",
  "created_at": "2024-02-24T12:34:56.789Z",
  "response": "The capital of France is Paris.",
  "done": true,
  "done_reason": "stop",
  "context": [128000, 3923, 374, 279, ...],
  "total_duration": 2345678901,
  "load_duration": 123456789,
  "prompt_eval_count": 8,
  "prompt_eval_duration": 234567890,
  "eval_count": 8,
  "eval_duration": 1987654321
}

Streaming Generation

import requests
import json

response = requests.post(
    'http://localhost:11434/api/generate',
    json={'model': 'llama3.2', 'prompt': 'Write a haiku'},
    stream=True
)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk['response'], end='', flush=True)

JSON Mode

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "List 3 colors",
  "format": "json",
  "stream": false
}'

Multimodal Input (Vision Models)

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2-vision",
  "prompt": "What is in this image?",
  "images": ["<base64-encoded-image>"],
  "stream": false
}'

Error Responses

error

string

Description of the error

Common Errors

400 Bad Request: Invalid model name, parameters, or image dimensions
404 Not Found: Model not found
500 Internal Server Error: Generation error

To unload a model from memory, send an empty prompt with keep_alive: 0.

Getting Started

Concepts

Compatibility

Endpoints

POST /api/generate

Request

Endpoint

Request Body

Image Generation Parameters

Response

Response Fields

Examples

Basic Text Generation

Example Response

Streaming Generation

JSON Mode

Multimodal Input (Vision Models)

Error Responses

Common Errors

Build docs developers (and LLMs) love

Getting Started

Concepts

Compatibility

Endpoints

​Request

​Endpoint

​Request Body

​Image Generation Parameters

​Response

​Response Fields

​Examples

​Basic Text Generation

​Example Response

​Streaming Generation

​JSON Mode

​Multimodal Input (Vision Models)

​Error Responses

​Common Errors

Build docs developers (and LLMs) love

Request

Endpoint

Request Body

Image Generation Parameters

Response

Response Fields

Examples

Basic Text Generation

Example Response

Streaming Generation

JSON Mode

Multimodal Input (Vision Models)

Error Responses

Common Errors