POST /api/generate
Generate a text completion from a single prompt. This is a simpler alternative to the chat endpoint for single-turn text generation.Request Body
The name of the model to use. Must be one of the available models from your
models.json configuration or built-in models (e.g., gpt-4o-mini, gemini-2.5-flash, deepseek-r1).The text prompt to generate a completion for.
Enable streaming responses. When
true, responses are returned as newline-delimited JSON chunks.Optional context array for maintaining state across requests.
Optional array of images in base64 format or HTTP(S) URLs for vision-capable models. Only JPEG format is supported.
Optional parameters to control generation behavior.
Controls randomness in generation (0.0 to 1.0). Higher values make output more creative.
Nucleus sampling parameter (0.0 to 1.0). Controls diversity of generated text.
Maximum number of tokens to generate.
Response
Non-Streaming Response
The name of the model used.
ISO 8601 timestamp of when the response was created.
The generated text completion.
Always
true for non-streaming responses.Context array that can be passed to subsequent requests.
Optional reasoning chain (for models like DeepSeek R1 that support reasoning).
Streaming Response
Whenstream: true, the response is sent as newline-delimited JSON chunks:
The name of the model used.
ISO 8601 timestamp for this chunk.
The text delta for this chunk (empty string in final chunk).
false for intermediate chunks, true for the final chunk.Context array (only present in final chunk).
Optional reasoning chain (only present in final chunk for supported models).
Examples
Response Examples
The generate endpoint internally converts your prompt into a single user message and calls the same generation logic as the chat endpoint. For multi-turn conversations, use the
/api/chat endpoint instead.