Skip to main content

Overview

The Dedalus SDK provides robust support for streaming responses using Server-Sent Events (SSE). This allows you to receive incremental chunks of the completion in real-time, enabling better user experiences for long-running requests.

Basic Streaming

To enable streaming, set the stream parameter to true in your completion request:
import Dedalus from 'dedalus-labs';

const client = new Dedalus();

const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  messages: [
    { role: 'system', content: 'You are Stephen Dedalus. Respond in morose Joycean malaise.' },
    { role: 'user', content: 'What do you think of artificial intelligence?' },
  ],
});

for await (const chunk of stream) {
  console.log(chunk.id);
  // Access delta content from the chunk
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    process.stdout.write(delta);
  }
}

Canceling Streams

You can cancel a stream in two ways:

Using break

const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  messages: [{ role: 'user', content: 'Write a long essay' }],
});

let tokenCount = 0;
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    process.stdout.write(delta);
    tokenCount++;
  }
  
  // Stop after 100 tokens
  if (tokenCount >= 100) {
    break;
  }
}

Using controller.abort()

const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  messages: [{ role: 'user', content: 'Write a long story' }],
});

// Cancel after 5 seconds
setTimeout(() => {
  stream.controller.abort();
}, 5000);

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    process.stdout.write(delta);
  }
}

Stream Chunk Structure

Each chunk in the stream has the following structure:
interface StreamChunk {
  id: string;                    // Unique identifier for the completion
  object: 'chat.completion.chunk';
  created: number;               // Unix timestamp
  model: string;                 // Model used
  choices: Array<{
    index: number;
    delta: {
      role?: 'assistant' | 'user' | 'system' | 'tool';
      content?: string | null;
      tool_calls?: Array<ToolCall>;
    };
    finish_reason?: 'stop' | 'length' | 'tool_calls' | 'content_filter' | null;
  }>;
  usage?: CompletionUsage | null; // Only in the final chunk
}

Working with Usage Statistics

Usage statistics are included in the final chunk when using stream options:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  stream_options: { include_usage: true },
  messages: [{ role: 'user', content: 'Hello!' }],
});

for await (const chunk of stream) {
  if (chunk.usage) {
    console.log('Total tokens:', chunk.usage.total_tokens);
    console.log('Prompt tokens:', chunk.usage.prompt_tokens);
    console.log('Completion tokens:', chunk.usage.completion_tokens);
  }
}

Stream Utilities

Converting to ReadableStream

You can convert a Stream to a standard ReadableStream:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  messages: [{ role: 'user', content: 'Hello' }],
});

const readableStream = stream.toReadableStream();

// Use with Response for streaming HTTP endpoints
return new Response(readableStream, {
  headers: { 'Content-Type': 'text/event-stream' },
});

Splitting Streams with tee()

You can split a stream into two independent streams:
const stream = await client.chat.completions.create({
  model: 'openai/gpt-5-nano',
  stream: true,
  messages: [{ role: 'user', content: 'Hello' }],
});

const [stream1, stream2] = stream.tee();

// Process both streams independently
const process1 = async () => {
  for await (const chunk of stream1) {
    // Process first stream
  }
};

const process2 = async () => {
  for await (const chunk of stream2) {
    // Process second stream
  }
};

await Promise.all([process1(), process2()]);

Error Handling

Always handle errors when working with streams to prevent unhandled promise rejections.
try {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-5-nano',
    stream: true,
    messages: [{ role: 'user', content: 'Hello' }],
  });

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) {
      process.stdout.write(delta);
    }
  }
} catch (error) {
  if (error instanceof Dedalus.APIError) {
    console.error('API Error:', error.status, error.message);
  } else {
    console.error('Unexpected error:', error);
  }
}

Server-Sent Events Format

The streaming endpoint returns Server-Sent Events (SSE) in this format:
data: {"id":"cmpl_123","choices":[{"index":0,"delta":{"content":"Hi"}}]}
data: {"id":"cmpl_123","choices":[{"index":0,"delta":{"content":" there!"}}]}
data: [DONE]
The SDK automatically handles parsing these events into typed objects.
If the user calls stream.controller.abort() or breaks from the loop, the SDK exits gracefully without throwing errors.

Best Practices

1

Buffer small chunks

Consider buffering very small chunks before displaying to avoid flickering in UI.
2

Handle network errors

Implement retry logic for network failures during streaming.
3

Clean up resources

Always ensure streams are properly closed or aborted when no longer needed.
4

Monitor usage

Use stream_options to track token usage for billing and rate limiting.

Build docs developers (and LLMs) love