Request
Endpoint
Request Body
Model name to use for generation (e.g.,
llama3.2, mistral, stable-diffusion)The prompt to generate a response for
Text that comes after the inserted text (for fill-in-the-middle models)
System message to override the model’s default system prompt
Custom prompt template to override the model’s default
Context from a previous generation to maintain conversational memory (deprecated)
Enable streaming of response chunks. Set to
false to wait for complete response.If
true, no formatting will be applied to the promptFormat to return response in. Use
"json" for JSON mode or provide a JSON schema.Array of base64-encoded images for multimodal models
Model-specific options for customizing generation
Duration to keep model in memory (e.g.,
"5m", "1h", -1 for indefinite)Enable thinking mode for reasoning models (
true/false or "high", "medium", "low")Truncate prompt if it exceeds context length
Shift context when hitting context length instead of erroring
Return log probabilities for output tokens
Number of top tokens to return at each position (0-20, requires
logprobs: true)Image Generation Parameters
The following parameters are only used with image generation models:
Width of generated image in pixels (max 4096)
Height of generated image in pixels (max 4096)
Number of diffusion steps for image generation
Response
Response Fields
The model used for generation
Timestamp in ISO 8601 format
The generated text response
Reasoning text when thinking mode is enabled
Whether generation is complete
Reason for completion:
stop, length, load, unloadToken encoding of the conversation for maintaining memory (deprecated)
Total duration in nanoseconds
Model loading time in nanoseconds
Number of tokens in the prompt
Time evaluating the prompt in nanoseconds
Number of tokens generated
Time generating response in nanoseconds
Log probability information (when enabled)
Base64-encoded generated image (image generation models only)
Number of completed steps during image generation
Total steps for image generation
Examples
Basic Text Generation
Example Response
Streaming Generation
JSON Mode
Multimodal Input (Vision Models)
Error Responses
Description of the error
Common Errors
- 400 Bad Request: Invalid model name, parameters, or image dimensions
- 404 Not Found: Model not found
- 500 Internal Server Error: Generation error
To unload a model from memory, send an empty prompt with
keep_alive: 0.