Streaming Responses

Send Message (Streaming)

curl -X POST "https://api.example.com/chat/prompt/stream" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "message": "Explain how JWT authentication works",
    "chat_id": "550e8400-e29b-41d4-a716-446655440000",
    "generate_title": true
  }'

Sends a message and receives the AI response as a real-time stream using Server-Sent Events (SSE). This provides a better user experience for long responses by displaying content as it’s generated.

Method & Path

POST /chat/prompt/stream

Authentication

Requires bearer token authentication via the Authorization header.

Request Body

message

string

required

The user’s message or question to send to the assistant

chat_id

string

ID of existing conversation to continue. If omitted or null, a new conversation is created automatically.

generate_title

boolean

default:true

Whether to auto-generate a conversation title from the first message. Only applies to new conversations or those with default titles.

provider_id

string

Optional LLM provider ID to override the default provider for this message

model_id

string

Optional model ID to use with the specified provider. Must be provided if provider_id is specified.

Response Format

The endpoint returns a text/event-stream response with Server-Sent Events. Each event has a type and associated data.

Event Types

status

event

Status updates about the processing stage

Show Event data

message

string

Human-readable status message (e.g., “Analyzing your question…”, “Found cached response, delivering instantly…”)

title

event

Auto-generated conversation title (emitted once when title generation completes)

Show Event data

title

string

The generated conversation title

final_answer

event

Content chunks of the AI response (emitted multiple times as the response is generated)

Show Event data

chunk

string

A portion of the response text (typically 1-5 characters for smooth streaming)

complete

event

Final event indicating the stream has finished (emitted once at the end)

Show Event data

answer

string

The complete AI response (all chunks concatenated)

chat_id

string

The conversation ID (either provided or newly created)

Stream Example

event: status
data: {"message": "Analyzing your question..."}

event: status
data: {"message": "Generating response..."}

event: final_answer
data: {"chunk": "JWT "}

event: final_answer
data: {"chunk": "(JSON"}

event: final_answer
data: {"chunk": " Web "}

event: final_answer
data: {"chunk": "Token"}

event: final_answer
data: {"chunk": "s) au"}

event: final_answer
data: {"chunk": "thent"}

event: title
data: {"title": "JWT Authentication Explanation"}

event: final_answer
data: {"chunk": "icati"}

event: final_answer
data: {"chunk": "on wo"}

... (more chunks) ...

event: complete
data: {"answer": "JWT (JSON Web Tokens) authentication works by...", "chat_id": "550e8400-e29b-41d4-a716-446655440000"}

Error Response

For validation or processing errors, a JSON response is returned instead of a stream:

{
  "success": false,
  "message": "Your message contains prohibited content. Please rephrase and try again."
}

For context-dependent queries without conversation history:

{
  "success": false,
  "message": "Could you please provide more context or specify what you're referring to?",
  "needs_clarification": true
}

Error Codes

401 Unauthorized: Missing or invalid authentication token
404 Not Found: Specified chat_id does not exist or user does not have access
422 Unprocessable Entity: Invalid request body format
500 Internal Server Error: Processing or database error

Implementation Guide

Client-Side Implementation

import { useState } from 'react';

function StreamingChat() {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [chatId, setChatId] = useState(null);

  const sendMessage = async (message) => {
    setIsStreaming(true);
    setResponse('');

    const res = await fetch('https://api.example.com/chat/prompt/stream', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_TOKEN}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        message,
        chat_id: chatId,
        generate_title: true
      })
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop(); // Keep incomplete line in buffer

      for (const line of lines) {
        if (line.startsWith('event: ')) {
          const eventType = line.substring(7);
          const nextLine = lines.shift();
          
          if (nextLine?.startsWith('data: ')) {
            const data = JSON.parse(nextLine.substring(6));
            
            if (eventType === 'final_answer') {
              setResponse(prev => prev + data.chunk);
            } else if (eventType === 'complete') {
              setChatId(data.chat_id);
              setIsStreaming(false);
            }
          }
        }
      }
    }
  };

  return (
    <div>
      <div>{response}</div>
      <button onClick={() => sendMessage('Hello!')} disabled={isStreaming}>
        Send Message
      </button>
    </div>
  );
}

Best Practices

Handle Connection Errors Gracefully

Implement retry logic for network interruptions. If the stream disconnects, you can use the non-streaming endpoint as a fallback to ensure message delivery.

try {
  await streamMessage(message);
} catch (error) {
  console.error('Stream failed, falling back to non-streaming');
  const response = await fetch('/chat/prompt', {
    method: 'POST',
    headers: headers,
    body: JSON.stringify({ message, chat_id })
  });
  return await response.json();
}

Buffer Management

SSE data may arrive in partial chunks. Always maintain a buffer for incomplete lines and only process complete event pairs (event + data lines).

Title Generation Timing

The title event may arrive at any point during the stream (it’s generated in parallel). Don’t assume it will arrive before or after specific content chunks.

Cache Indication

When a cached response is returned, you’ll receive a status event indicating “Found cached response, delivering instantly…” before the content stream begins.

Conversation Context

Context-dependent queries (e.g., “What about that?”, “Tell me more”) without conversation history will trigger a clarification response. Ensure your UI guides users to provide complete questions for new conversations.

Response Headers

The streaming endpoint sets specific headers for SSE:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Streaming provides a better user experience but requires proper client-side implementation. For simpler integrations, use the non-streaming endpoint at /chat/prompt.

Ensure your client properly closes the stream connection when done to avoid resource leaks. Most modern HTTP clients handle this automatically when the stream ends.

Authentication

Chat

Knowledge Base

Feedback

Administration

Send Message (Streaming)

Method & Path

Authentication

Request Body

Response Format

Event Types

Stream Example

Error Response

Error Codes

Implementation Guide

Client-Side Implementation

Best Practices

Response Headers

Build docs developers (and LLMs) love

Authentication

Chat

Knowledge Base

Feedback

Administration

​Send Message (Streaming)

​Method & Path

​Authentication

​Request Body

​Response Format

​Event Types

​Stream Example

​Error Response

​Error Codes

​Implementation Guide

​Client-Side Implementation

​Best Practices

​Response Headers

Build docs developers (and LLMs) love

Send Message (Streaming)

Method & Path

Authentication

Request Body

Response Format

Event Types

Stream Example

Error Response

Error Codes

Implementation Guide

Client-Side Implementation

Best Practices

Response Headers