Skip to main content

Overview

Streaming allows you to receive chat completion responses incrementally as they are generated, rather than waiting for the entire response. This provides a better user experience for long responses and enables real-time UI updates.

Basic Streaming

Enable streaming by setting stream: true in your request:
import Dedalus from 'dedalus-labs';

const client = new Dedalus({
  apiKey: process.env.DEDALUS_API_KEY,
});

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [
    { role: 'user', content: 'Write a story about a robot.' }
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Stream Chunk Format

Each chunk in the stream is a StreamChunk object with the following structure:
id
string
required
Unique identifier for the completion (same across all chunks).
object
string
required
Object type, always 'chat.completion.chunk'.
created
number
required
Unix timestamp when the chunk was created.
model
string
required
Model used for the completion.
choices
Array<ChunkChoice>
required
Array of streaming choice chunks. Each contains:
  • index (number) - Choice index
  • delta (ChoiceDelta) - Incremental content update
  • finish_reason (string | null) - Reason for stopping (only in final chunk)
  • logprobs (ChoiceLogprobs | null) - Log probability information
usage
CompletionUsage | null
Token usage statistics. Only included in the final chunk when stream_options.include_usage: true.

Delta Object

The delta object contains incremental updates:
interface ChoiceDelta {
  role?: 'assistant' | 'user' | 'system' | 'developer' | 'tool';
  content?: string | null;
  tool_calls?: Array<ChoiceDeltaToolCall> | null;
  function_call?: ChoiceDeltaFunctionCall | null;
  refusal?: string | null;
}

Using Stream Utilities

Dedalus provides helper functions for working with streams:

streamAsync

Async iterator for streaming responses:
import { streamAsync } from 'dedalus-labs';

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of streamAsync(stream)) {
  console.log(chunk.choices[0]?.delta?.content || '');
}

streamSync

Synchronous stream processing:
import { streamSync } from 'dedalus-labs';

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

streamSync(stream, (chunk) => {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
});

Stream Options

Control streaming behavior with stream_options:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
  stream_options: {
    include_usage: true, // Include usage stats in final chunk
  },
});
stream_options.include_usage
boolean
default:"false"
When true, the final chunk includes token usage statistics in the usage field.

Complete Examples

Building Complete Response

Accumulate the full response from chunks:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [
    { role: 'user', content: 'Explain quantum computing.' }
  ],
  stream: true,
});

let fullContent = '';

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    fullContent += delta;
    process.stdout.write(delta);
  }
}

console.log('\n\nComplete response:', fullContent);

Handling Tool Calls in Streams

When streaming with tool calls, accumulate the tool call deltas:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'What is the weather in Paris?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get the weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      }
    }
  ],
  stream: true,
});

const toolCalls: Record<number, any> = {};

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  
  if (delta?.tool_calls) {
    for (const toolCall of delta.tool_calls) {
      const index = toolCall.index;
      
      if (!toolCalls[index]) {
        toolCalls[index] = {
          id: toolCall.id || '',
          type: toolCall.type || 'function',
          function: {
            name: toolCall.function?.name || '',
            arguments: toolCall.function?.arguments || ''
          }
        };
      } else {
        if (toolCall.function?.arguments) {
          toolCalls[index].function.arguments += toolCall.function.arguments;
        }
      }
    }
  }
}

console.log('Tool calls:', Object.values(toolCalls));

React Component Example

Integrate streaming into a React application:
import { useState } from 'react';
import Dedalus from 'dedalus-labs';

function ChatComponent() {
  const [response, setResponse] = useState('');
  const [isLoading, setIsLoading] = useState(false);

  const sendMessage = async (message: string) => {
    setIsLoading(true);
    setResponse('');

    const client = new Dedalus({
      apiKey: process.env.NEXT_PUBLIC_DEDALUS_API_KEY,
    });

    const stream = await client.chat.completions.create({
      model: 'openai/gpt-4',
      messages: [{ role: 'user', content: message }],
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        setResponse((prev) => prev + content);
      }
    }

    setIsLoading(false);
  };

  return (
    <div>
      <button onClick={() => sendMessage('Hello!')} disabled={isLoading}>
        Send Message
      </button>
      <div>{response}</div>
    </div>
  );
}

Node.js Server with Server-Sent Events

Implement SSE streaming in an Express server:
import express from 'express';
import Dedalus from 'dedalus-labs';

const app = express();
const client = new Dedalus({
  apiKey: process.env.DEDALUS_API_KEY,
});

app.get('/chat', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4',
    messages: [{ role: 'user', content: req.query.message as string }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

app.listen(3000);

Finish Reasons

The finish_reason field appears in the final chunk and indicates why generation stopped:
  • 'stop' - Natural stop point or stop sequence reached
  • 'length' - Maximum token limit reached
  • 'tool_calls' - Model called a tool
  • 'content_filter' - Content filtered due to safety
  • 'function_call' - Model called a function (deprecated)
for await (const chunk of stream) {
  const choice = chunk.choices[0];
  
  if (choice?.finish_reason) {
    console.log('Generation finished:', choice.finish_reason);
  }
}

Error Handling

Handle errors during streaming:
import { APIError } from 'dedalus-labs';

try {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
    }
  }
} catch (error) {
  if (error instanceof APIError) {
    console.error('Streaming error:', error.message);
  }
}

TypeScript Types

Key types for streaming:
import type {
  StreamChunk,
  ChunkChoice,
  ChoiceDelta,
  ChoiceDeltaToolCall,
  CompletionUsage,
} from 'dedalus-labs';

// Stream response
type StreamResponse = Stream<StreamChunk>;

// Processing a chunk
function processChunk(chunk: StreamChunk) {
  const choice: ChunkChoice = chunk.choices[0];
  const delta: ChoiceDelta = choice.delta;
  // ...
}

Best Practices

  1. Always handle errors - Wrap streaming in try/catch blocks
  2. Buffer carefully - Accumulate content efficiently to avoid memory issues
  3. Show loading states - Indicate when streaming is in progress
  4. Handle finish reasons - Check why generation stopped
  5. Include usage stats - Set stream_options.include_usage: true for token tracking
  6. Clean up resources - Ensure streams are properly closed on unmount/cleanup

Performance Considerations

  • Streaming reduces time-to-first-token significantly
  • Users can see progress immediately
  • Better for long-form content generation
  • Network overhead is slightly higher due to multiple chunks
  • Consider implementing request cancellation for user-initiated stops

Next Steps

Chat Completions

Learn about non-streaming completions

Tool Calling

Implement function and tool calling

Build docs developers (and LLMs) love