Skip to main content

Endpoint

POST /generate
The /generate endpoint provides a simplified text generation interface for quick testing and development. For production use, prefer the OpenAI-compatible /v1/chat/completions endpoint.

Request Body

prompt
string
required
The input text prompt to generate from.
max_tokens
integer
required
The maximum number of tokens to generate.
ignore_eos
boolean
default:"false"
Whether to ignore the end-of-sequence token and continue generation.

Response Format

The endpoint returns a streaming response using Server-Sent Events (SSE). Each event contains incremental text output:
data: Hello
data:  world
data: !
data: [DONE]
The stream ends with a data: [DONE] message.

Example

Basic Generation

curl http://localhost:1919/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "ignore_eos": false
  }'
Response:
data: , there
data:  was
data:  a
data:  young
data:  princess
data:  who
...
data: [DONE]

Notes

The /generate endpoint is primarily for testing and debugging. For production applications, use the /v1/chat/completions endpoint which provides more features and OpenAI API compatibility.
This endpoint does not support chat-style messages or advanced sampling parameters like temperature or top_p. Use /v1/chat/completions for full control over generation.

Build docs developers (and LLMs) love