Streaming Chat Completions

Overview

Streaming allows you to receive chat completion responses incrementally as they are generated, rather than waiting for the entire response. This provides a better user experience for long responses and enables real-time UI updates.

Basic Streaming

Enable streaming by setting stream: true in your request:

import Dedalus from 'dedalus-labs';

const client = new Dedalus({
  apiKey: process.env.DEDALUS_API_KEY,
});

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [
    { role: 'user', content: 'Write a story about a robot.' }
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Stream Chunk Format

Each chunk in the stream is a StreamChunk object with the following structure:

string

required

Unique identifier for the completion (same across all chunks).

object

string

required

Object type, always 'chat.completion.chunk'.

created

number

required

Unix timestamp when the chunk was created.

model

string

required

Model used for the completion.

choices

Array<ChunkChoice>

required

Array of streaming choice chunks. Each contains:

index (number) - Choice index
delta (ChoiceDelta) - Incremental content update
finish_reason (string | null) - Reason for stopping (only in final chunk)
logprobs (ChoiceLogprobs | null) - Log probability information

usage

CompletionUsage | null

Token usage statistics. Only included in the final chunk when stream_options.include_usage: true.

Delta Object

The delta object contains incremental updates:

interface ChoiceDelta {
  role?: 'assistant' | 'user' | 'system' | 'developer' | 'tool';
  content?: string | null;
  tool_calls?: Array<ChoiceDeltaToolCall> | null;
  function_call?: ChoiceDeltaFunctionCall | null;
  refusal?: string | null;
}

Using Stream Utilities

Dedalus provides helper functions for working with streams:

streamAsync

Async iterator for streaming responses:

import { streamAsync } from 'dedalus-labs';

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of streamAsync(stream)) {
  console.log(chunk.choices[0]?.delta?.content || '');
}

streamSync

Synchronous stream processing:

import { streamSync } from 'dedalus-labs';

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

streamSync(stream, (chunk) => {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
});

Stream Options

Control streaming behavior with stream_options:

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
  stream_options: {
    include_usage: true, // Include usage stats in final chunk
  },
});

stream_options.include_usage

boolean

default:"false"

When true, the final chunk includes token usage statistics in the usage field.

Complete Examples

Building Complete Response

Accumulate the full response from chunks:

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [
    { role: 'user', content: 'Explain quantum computing.' }
  ],
  stream: true,
});

let fullContent = '';

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    fullContent += delta;
    process.stdout.write(delta);
  }
}

console.log('\n\nComplete response:', fullContent);

Handling Tool Calls in Streams

When streaming with tool calls, accumulate the tool call deltas:

const stream = await client.chat.completions.create({
  model: 'openai/gpt-4',
  messages: [{ role: 'user', content: 'What is the weather in Paris?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get the weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      }
    }
  ],
  stream: true,
});

const toolCalls: Record<number, any> = {};

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  
  if (delta?.tool_calls) {
    for (const toolCall of delta.tool_calls) {
      const index = toolCall.index;
      
      if (!toolCalls[index]) {
        toolCalls[index] = {
          id: toolCall.id || '',
          type: toolCall.type || 'function',
          function: {
            name: toolCall.function?.name || '',
            arguments: toolCall.function?.arguments || ''
          }
        };
      } else {
        if (toolCall.function?.arguments) {
          toolCalls[index].function.arguments += toolCall.function.arguments;
        }
      }
    }
  }
}

console.log('Tool calls:', Object.values(toolCalls));

React Component Example

Integrate streaming into a React application:

import { useState } from 'react';
import Dedalus from 'dedalus-labs';

function ChatComponent() {
  const [response, setResponse] = useState('');
  const [isLoading, setIsLoading] = useState(false);

  const sendMessage = async (message: string) => {
    setIsLoading(true);
    setResponse('');

    const client = new Dedalus({
      apiKey: process.env.NEXT_PUBLIC_DEDALUS_API_KEY,
    });

    const stream = await client.chat.completions.create({
      model: 'openai/gpt-4',
      messages: [{ role: 'user', content: message }],
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        setResponse((prev) => prev + content);
      }
    }

    setIsLoading(false);
  };

  return (
    <div>
      <button onClick={() => sendMessage('Hello!')} disabled={isLoading}>
        Send Message
      </button>
      <div>{response}</div>
    </div>
  );
}

Node.js Server with Server-Sent Events

Implement SSE streaming in an Express server:

import express from 'express';
import Dedalus from 'dedalus-labs';

const app = express();
const client = new Dedalus({
  apiKey: process.env.DEDALUS_API_KEY,
});

app.get('/chat', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4',
    messages: [{ role: 'user', content: req.query.message as string }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

app.listen(3000);

Finish Reasons

The finish_reason field appears in the final chunk and indicates why generation stopped:

'stop' - Natural stop point or stop sequence reached
'length' - Maximum token limit reached
'tool_calls' - Model called a tool
'content_filter' - Content filtered due to safety
'function_call' - Model called a function (deprecated)

for await (const chunk of stream) {
  const choice = chunk.choices[0];
  
  if (choice?.finish_reason) {
    console.log('Generation finished:', choice.finish_reason);
  }
}

Error Handling

Handle errors during streaming:

import { APIError } from 'dedalus-labs';

try {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
    }
  }
} catch (error) {
  if (error instanceof APIError) {
    console.error('Streaming error:', error.message);
  }
}

TypeScript Types

Key types for streaming:

import type {
  StreamChunk,
  ChunkChoice,
  ChoiceDelta,
  ChoiceDeltaToolCall,
  CompletionUsage,
} from 'dedalus-labs';

// Stream response
type StreamResponse = Stream<StreamChunk>;

// Processing a chunk
function processChunk(chunk: StreamChunk) {
  const choice: ChunkChoice = chunk.choices[0];
  const delta: ChoiceDelta = choice.delta;
  // ...
}

Best Practices

Always handle errors - Wrap streaming in try/catch blocks
Buffer carefully - Accumulate content efficiently to avoid memory issues
Show loading states - Indicate when streaming is in progress
Handle finish reasons - Check why generation stopped
Include usage stats - Set stream_options.include_usage: true for token tracking
Clean up resources - Ensure streams are properly closed on unmount/cleanup

Performance Considerations

Streaming reduces time-to-first-token significantly
Users can see progress immediately
Better for long-form content generation
Network overhead is slightly higher due to multiple chunks
Consider implementing request cancellation for user-initiated stops

Overview

Chat

Audio

Images

Embeddings

Models

Types

Overview

Basic Streaming

Stream Chunk Format

Delta Object

Using Stream Utilities

streamAsync

streamSync

Stream Options

Complete Examples

Building Complete Response

Handling Tool Calls in Streams

React Component Example

Node.js Server with Server-Sent Events

Finish Reasons

Error Handling

TypeScript Types

Best Practices

Performance Considerations

Next Steps

Chat Completions

Tool Calling

Build docs developers (and LLMs) love

Overview

Chat

Audio

Images

Embeddings

Models

Types

​Overview

​Basic Streaming

​Stream Chunk Format

​Delta Object

​Using Stream Utilities

​streamAsync

​streamSync

​Stream Options

​Complete Examples

​Building Complete Response

​Handling Tool Calls in Streams

​React Component Example

​Node.js Server with Server-Sent Events

​Finish Reasons

​Error Handling

​TypeScript Types

​Best Practices

​Performance Considerations

​Next Steps

Chat Completions

Tool Calling

Build docs developers (and LLMs) love

Overview

Basic Streaming

Stream Chunk Format

Delta Object

Using Stream Utilities

streamAsync

streamSync

Stream Options

Complete Examples

Building Complete Response

Handling Tool Calls in Streams

React Component Example

Node.js Server with Server-Sent Events

Finish Reasons

Error Handling

TypeScript Types

Best Practices

Performance Considerations

Next Steps