Generate

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Endpoint
Request Body
Response Format
Example
Basic Generation
Notes

Endpoint

POST /generate

The /generate endpoint provides a simplified text generation interface for quick testing and development. For production use, prefer the OpenAI-compatible /v1/chat/completions endpoint.

Request Body

prompt

string

required

The input text prompt to generate from.

max_tokens

integer

required

The maximum number of tokens to generate.

ignore_eos

boolean

default:"false"

Whether to ignore the end-of-sequence token and continue generation.

Response Format

The endpoint returns a streaming response using Server-Sent Events (SSE). Each event contains incremental text output:

data: Hello
data:  world
data: !
data: [DONE]

The stream ends with a data: [DONE] message.

Example

Basic Generation

curl http://localhost:1919/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "ignore_eos": false
  }'

Response:

data: , there
data:  was
data:  a
data:  young
data:  princess
data:  who
...
data: [DONE]

Notes

The /generate endpoint is primarily for testing and debugging. For production applications, use the /v1/chat/completions endpoint which provides more features and OpenAI API compatibility.

This endpoint does not support chat-style messages or advanced sampling parameters like temperature or top_p. Use /v1/chat/completions for full control over generation.

Models

LLM Class

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

API Endpoints

Python API

Architecture

Endpoint

Request Body

Response Format

Example

Basic Generation

Notes

Build docs developers (and LLMs) love

API Endpoints

Python API

Architecture

​Endpoint

​Request Body

​Response Format

​Example

​Basic Generation

​Notes

Build docs developers (and LLMs) love

Endpoint

Request Body

Response Format

Example

Basic Generation

Notes