Send Message (Streaming)
curl -X POST "https://api.example.com/chat/prompt/stream" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-N \
-d '{
"message": "Explain how JWT authentication works",
"chat_id": "550e8400-e29b-41d4-a716-446655440000",
"generate_title": true
}'
Sends a message and receives the AI response as a real-time stream using Server-Sent Events (SSE). This provides a better user experience for long responses by displaying content as it’s generated.
Method & Path
Authentication
Requires bearer token authentication via the Authorization header.
Request Body
The user’s message or question to send to the assistant
ID of existing conversation to continue. If omitted or null, a new conversation is created automatically.
Whether to auto-generate a conversation title from the first message. Only applies to new conversations or those with default titles.
Optional LLM provider ID to override the default provider for this message
Optional model ID to use with the specified provider. Must be provided if provider_id is specified.
The endpoint returns a text/event-stream response with Server-Sent Events. Each event has a type and associated data.
Event Types
Status updates about the processing stage Human-readable status message (e.g., “Analyzing your question…”, “Found cached response, delivering instantly…”)
Auto-generated conversation title (emitted once when title generation completes) The generated conversation title
Content chunks of the AI response (emitted multiple times as the response is generated) A portion of the response text (typically 1-5 characters for smooth streaming)
Final event indicating the stream has finished (emitted once at the end) The complete AI response (all chunks concatenated)
The conversation ID (either provided or newly created)
Stream Example
event: status
data: {"message": "Analyzing your question..."}
event: status
data: {"message": "Generating response..."}
event: final_answer
data: {"chunk": "JWT "}
event: final_answer
data: {"chunk": "(JSON"}
event: final_answer
data: {"chunk": " Web "}
event: final_answer
data: {"chunk": "Token"}
event: final_answer
data: {"chunk": "s) au"}
event: final_answer
data: {"chunk": "thent"}
event: title
data: {"title": "JWT Authentication Explanation"}
event: final_answer
data: {"chunk": "icati"}
event: final_answer
data: {"chunk": "on wo"}
... (more chunks) ...
event: complete
data: {"answer": "JWT (JSON Web Tokens) authentication works by...", "chat_id": "550e8400-e29b-41d4-a716-446655440000"}
Error Response
For validation or processing errors, a JSON response is returned instead of a stream:
{
"success" : false ,
"message" : "Your message contains prohibited content. Please rephrase and try again."
}
For context-dependent queries without conversation history:
{
"success" : false ,
"message" : "Could you please provide more context or specify what you're referring to?" ,
"needs_clarification" : true
}
Error Codes
401 Unauthorized : Missing or invalid authentication token
404 Not Found : Specified chat_id does not exist or user does not have access
422 Unprocessable Entity : Invalid request body format
500 Internal Server Error : Processing or database error
Implementation Guide
Client-Side Implementation
React Example
Python AsyncIO
import { useState } from 'react' ;
function StreamingChat () {
const [ response , setResponse ] = useState ( '' );
const [ isStreaming , setIsStreaming ] = useState ( false );
const [ chatId , setChatId ] = useState ( null );
const sendMessage = async ( message ) => {
setIsStreaming ( true );
setResponse ( '' );
const res = await fetch ( 'https://api.example.com/chat/prompt/stream' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ API_TOKEN } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
message ,
chat_id: chatId ,
generate_title: true
})
});
const reader = res . body . getReader ();
const decoder = new TextDecoder ();
let buffer = '' ;
while ( true ) {
const { done , value } = await reader . read ();
if ( done ) break ;
buffer += decoder . decode ( value , { stream: true });
const lines = buffer . split ( ' \n ' );
buffer = lines . pop (); // Keep incomplete line in buffer
for ( const line of lines ) {
if ( line . startsWith ( 'event: ' )) {
const eventType = line . substring ( 7 );
const nextLine = lines . shift ();
if ( nextLine ?. startsWith ( 'data: ' )) {
const data = JSON . parse ( nextLine . substring ( 6 ));
if ( eventType === 'final_answer' ) {
setResponse ( prev => prev + data . chunk );
} else if ( eventType === 'complete' ) {
setChatId ( data . chat_id );
setIsStreaming ( false );
}
}
}
}
}
};
return (
< div >
< div > { response } </ div >
< button onClick = { () => sendMessage ( 'Hello!' ) } disabled = { isStreaming } >
Send Message
</ button >
</ div >
);
}
Best Practices
Handle Connection Errors Gracefully
Implement retry logic for network interruptions. If the stream disconnects, you can use the non-streaming endpoint as a fallback to ensure message delivery. try {
await streamMessage ( message );
} catch ( error ) {
console . error ( 'Stream failed, falling back to non-streaming' );
const response = await fetch ( '/chat/prompt' , {
method: 'POST' ,
headers: headers ,
body: JSON . stringify ({ message , chat_id })
});
return await response . json ();
}
SSE data may arrive in partial chunks. Always maintain a buffer for incomplete lines and only process complete event pairs (event + data lines).
The title event may arrive at any point during the stream (it’s generated in parallel). Don’t assume it will arrive before or after specific content chunks.
When a cached response is returned, you’ll receive a status event indicating “Found cached response, delivering instantly…” before the content stream begins.
Context-dependent queries (e.g., “What about that?”, “Tell me more”) without conversation history will trigger a clarification response. Ensure your UI guides users to provide complete questions for new conversations.
The streaming endpoint sets specific headers for SSE:
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Streaming provides a better user experience but requires proper client-side implementation. For simpler integrations, use the non-streaming endpoint at /chat/prompt.
Ensure your client properly closes the stream connection when done to avoid resource leaks. Most modern HTTP clients handle this automatically when the stream ends.