Skip to main content

Send Message (Streaming)

curl -X POST "https://api.example.com/chat/prompt/stream" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "message": "Explain how JWT authentication works",
    "chat_id": "550e8400-e29b-41d4-a716-446655440000",
    "generate_title": true
  }'
Sends a message and receives the AI response as a real-time stream using Server-Sent Events (SSE). This provides a better user experience for long responses by displaying content as it’s generated.

Method & Path

POST /chat/prompt/stream

Authentication

Requires bearer token authentication via the Authorization header.

Request Body

message
string
required
The user’s message or question to send to the assistant
chat_id
string
ID of existing conversation to continue. If omitted or null, a new conversation is created automatically.
generate_title
boolean
default:true
Whether to auto-generate a conversation title from the first message. Only applies to new conversations or those with default titles.
provider_id
string
Optional LLM provider ID to override the default provider for this message
model_id
string
Optional model ID to use with the specified provider. Must be provided if provider_id is specified.

Response Format

The endpoint returns a text/event-stream response with Server-Sent Events. Each event has a type and associated data.

Event Types

status
event
Status updates about the processing stage
title
event
Auto-generated conversation title (emitted once when title generation completes)
final_answer
event
Content chunks of the AI response (emitted multiple times as the response is generated)
complete
event
Final event indicating the stream has finished (emitted once at the end)

Stream Example

event: status
data: {"message": "Analyzing your question..."}

event: status
data: {"message": "Generating response..."}

event: final_answer
data: {"chunk": "JWT "}

event: final_answer
data: {"chunk": "(JSON"}

event: final_answer
data: {"chunk": " Web "}

event: final_answer
data: {"chunk": "Token"}

event: final_answer
data: {"chunk": "s) au"}

event: final_answer
data: {"chunk": "thent"}

event: title
data: {"title": "JWT Authentication Explanation"}

event: final_answer
data: {"chunk": "icati"}

event: final_answer
data: {"chunk": "on wo"}

... (more chunks) ...

event: complete
data: {"answer": "JWT (JSON Web Tokens) authentication works by...", "chat_id": "550e8400-e29b-41d4-a716-446655440000"}

Error Response

For validation or processing errors, a JSON response is returned instead of a stream:
{
  "success": false,
  "message": "Your message contains prohibited content. Please rephrase and try again."
}
For context-dependent queries without conversation history:
{
  "success": false,
  "message": "Could you please provide more context or specify what you're referring to?",
  "needs_clarification": true
}

Error Codes

  • 401 Unauthorized: Missing or invalid authentication token
  • 404 Not Found: Specified chat_id does not exist or user does not have access
  • 422 Unprocessable Entity: Invalid request body format
  • 500 Internal Server Error: Processing or database error

Implementation Guide

Client-Side Implementation

import { useState } from 'react';

function StreamingChat() {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [chatId, setChatId] = useState(null);

  const sendMessage = async (message) => {
    setIsStreaming(true);
    setResponse('');

    const res = await fetch('https://api.example.com/chat/prompt/stream', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_TOKEN}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        message,
        chat_id: chatId,
        generate_title: true
      })
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop(); // Keep incomplete line in buffer

      for (const line of lines) {
        if (line.startsWith('event: ')) {
          const eventType = line.substring(7);
          const nextLine = lines.shift();
          
          if (nextLine?.startsWith('data: ')) {
            const data = JSON.parse(nextLine.substring(6));
            
            if (eventType === 'final_answer') {
              setResponse(prev => prev + data.chunk);
            } else if (eventType === 'complete') {
              setChatId(data.chat_id);
              setIsStreaming(false);
            }
          }
        }
      }
    }
  };

  return (
    <div>
      <div>{response}</div>
      <button onClick={() => sendMessage('Hello!')} disabled={isStreaming}>
        Send Message
      </button>
    </div>
  );
}

Best Practices

Implement retry logic for network interruptions. If the stream disconnects, you can use the non-streaming endpoint as a fallback to ensure message delivery.
try {
  await streamMessage(message);
} catch (error) {
  console.error('Stream failed, falling back to non-streaming');
  const response = await fetch('/chat/prompt', {
    method: 'POST',
    headers: headers,
    body: JSON.stringify({ message, chat_id })
  });
  return await response.json();
}
SSE data may arrive in partial chunks. Always maintain a buffer for incomplete lines and only process complete event pairs (event + data lines).
The title event may arrive at any point during the stream (it’s generated in parallel). Don’t assume it will arrive before or after specific content chunks.
When a cached response is returned, you’ll receive a status event indicating “Found cached response, delivering instantly…” before the content stream begins.
Context-dependent queries (e.g., “What about that?”, “Tell me more”) without conversation history will trigger a clarification response. Ensure your UI guides users to provide complete questions for new conversations.

Response Headers

The streaming endpoint sets specific headers for SSE:
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Streaming provides a better user experience but requires proper client-side implementation. For simpler integrations, use the non-streaming endpoint at /chat/prompt.
Ensure your client properly closes the stream connection when done to avoid resource leaks. Most modern HTTP clients handle this automatically when the stream ends.

Build docs developers (and LLMs) love