POST /v1/chat/stream) returns the assistant’s response as newline-delimited JSON (NDJSON) events. This allows you to display the response progressively as it’s being generated, providing a better user experience.
Endpoint
Request Headers
x-widget-api-key instead of x-api-key.
Request Body
Parameters
Unique identifier for the user session. Must be 1-128 characters, alphanumeric with
._:- allowed.The user’s message. Must be 1-4000 characters.
Response Format
On success, returns a200 OK status with Content-Type: application/x-ndjson and a stream of JSON events.
Stream Event Types
The stream consists of four event types, each on its own line:Start Event
Sent first to indicate the stream has begun:Always
"start"The unique ID of the conversation
Token Event
Sent for each token (word/word fragment) as it’s generated:Always
"token"A single token from the assistant’s response
Done Event
Sent last when the response is complete:Always
"done"The complete assistant response (all tokens combined)
The unique ID of the conversation
Error Event
Sent if an error occurs during streaming:Always
"error"Error message describing what went wrong
Example Usage
Here’s a complete example showing how to consume the stream:Error Responses
If an error occurs before streaming begins, you’ll receive a standard JSON error response:400 Bad Request
401 Unauthorized
429 Too Many Requests
500 Internal Server Error
If an error occurs after streaming has started, you’ll receive an error event in the stream:How It Works
When you send a message to/v1/chat/stream, the backend (backend/src/server.ts:508-567):
- Validates your API key and request payload
- Creates or retrieves the conversation based on
sessionId - Adds your message to the conversation history
- Opens a streaming connection to OpenAI
- Sends a
startevent with the conversation ID - Forwards each
tokenevent as it arrives from OpenAI - Saves the complete assistant message to the database
- Sends a
doneevent with the complete message
The stream uses NDJSON (Newline Delimited JSON) format where each line is a complete JSON object. This makes it easy to parse line-by-line without waiting for the entire response.