Overview
Streaming allows you to receive chat completion responses incrementally as they are generated, rather than waiting for the complete response.
Endpoint
POST /v1/chat/completions
Set stream: true in the request body to enable streaming.
Request
Streaming Parameters
Set to true to enable streaming
Additional streaming options{
"include_usage": true
}
All other parameters are identical to the Chat Completions endpoint.
Streamed responses are sent as Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Chunk Object
Unique identifier for the completion
Object type, always chat.completion.chunk
Unix timestamp of creation
Array of streaming choicesIncremental message contentRole (only in first chunk)
Reason for completion (only in last chunk)
Examples
Basic Streaming Request
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-portkey-provider: openai" \
-H "x-portkey-api-key: sk-..." \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Write a haiku about recursion"}],
"stream": true
}'
Python Streaming Example
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a story about AI"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
JavaScript Streaming Example
import Portkey from 'portkey-ai';
const client = new Portkey({
provider: 'openai',
Authorization: 'sk-...'
});
const stream = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{role: 'user', content: 'Write a story about AI'}],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
OpenAI SDK Streaming
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8787/v1",
default_headers={
"x-portkey-provider": "openai",
"x-portkey-api-key": "sk-..."
}
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
stream_options={"include_usage": True}
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Usage info in final chunk
if hasattr(chunk, 'usage') and chunk.usage:
print(f"\n\nTokens used: {chunk.usage.total_tokens}")
Streaming with Function Calling
from portkey_ai import Portkey
import json
client = Portkey(
provider="openai",
Authorization="sk-..."
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in Boston?"}],
tools=tools,
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
# Handle tool calls
if delta.tool_calls:
for tool_call in delta.tool_calls:
if tool_call.function.name:
print(f"Calling: {tool_call.function.name}")
if tool_call.function.arguments:
print(f"Args: {tool_call.function.arguments}")
# Handle content
if delta.content:
print(delta.content, end="")
Error Handling
Handle errors during streaming:
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
try:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
except Exception as e:
print(f"Streaming error: {e}")
Best Practices
- Buffer Management: Process chunks as they arrive to provide real-time feedback
- Error Recovery: Implement proper error handling for connection issues
- Token Counting: Use
stream_options.include_usage to track token usage
- Connection Timeout: Set appropriate timeouts for long-running streams
- UI Updates: Update your UI incrementally for better user experience