Real-time Streaming
Streaming allows you to receive AI responses in real-time as they’re generated, providing a better user experience for long-form content.How Streaming Works
Cencori uses Server-Sent Events (SSE) to stream responses from AI providers. Each chunk is sent as adata: event with JSON payload.
Stream Format
Each chunk follows this format:Basic Streaming
Next.js App Router Example
Here’s how streaming is implemented in the Cencori platform:app/api/ai/chat/route.ts
Client-Side Streaming
React Hook Example
hooks/use-stream.ts
Streaming with Fallback
Cencori automatically handles failover during streaming:lib/providers/router.ts
Error Handling
-
Detect Errors in Stream
Check each chunk for error fields:
-
Handle Network Failures
Wrap stream processing in try-catch:
-
Implement Timeout
Add timeout to prevent hanging:
Best Practices
- Buffer Management: Process chunks immediately to avoid memory buildup
- Security Scanning: Implement real-time content filtering during streaming
- User Feedback: Show loading states and progress indicators
- Cancellation: Provide abort functionality for long-running streams
- Error Recovery: Gracefully handle stream interruptions
Performance Tips
- Use Smaller Models:
gemini-2.0-flashstreams faster thangpt-4 - Reduce max_tokens: Limit response length for faster completion
- Client-Side Buffering: Batch small chunks before rendering
- Connection Pooling: Reuse HTTP connections for multiple streams
Next Steps
- Error Handling - Handle streaming errors gracefully
- Multi-Provider - Switch providers during streaming
- Cost Optimization - Optimize streaming costs