Streaming Chat Responses

Overview

Streaming allows you to receive chat completion responses incrementally as they’re generated, rather than waiting for the entire response. This is essential for building responsive user interfaces and real-time applications.

Basic Streaming

The simplest way to stream chat completions is using the stream method with event handling:

#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative "../../lib/openai"

# gets API Key from environment variable `OPENAI_API_KEY`
client = OpenAI::Client.new

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  messages: [
    {role: :user, content: "Write a creative haiku about the ocean."}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
  end
end

Streaming Methods

Event-based Streaming
Text-only Streaming
Raw Streaming

The event-based approach gives you fine-grained control over different event types:

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  messages: [
    {role: :user, content: "Write a creative haiku about the ocean."}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
  end
end

For simpler use cases, use the text helper to stream only the text content:

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  messages: [
    {role: :user, content: "List three fun facts about dolphins."}
  ]
)

stream.text.each do |text|
  print(text)
end
puts

For complete control over the stream, use stream_raw to access raw chunk data:

stream = client.chat.completions.stream_raw(
  model: "gpt-4",
  messages: [
    {
      role: "user",
      content: "How do I output all files in a directory using Python?"
    }
  ]
)

stream.each do |chunk|
  next if chunk.choices.to_a.empty?

  pp(chunk.choices.first&.delta&.content)
end

Key Concepts

Stream Events

When using event-based streaming, you’ll encounter different event types:

ChatContentDeltaEvent - Contains incremental content as it’s generated
ChatContentDoneEvent - Signals that content generation is complete
ChatFunctionToolCallArgumentsDeltaEvent - Contains incremental tool call arguments
ChatFunctionToolCallArgumentsDoneEvent - Signals tool call arguments are complete

Stream Cleanup

Streams are automatically cleaned up when you iterate through them with each. If you don’t consume the stream, make sure to call stream.close manually to free resources.

Advanced Example: Streaming with Structured Outputs

You can combine streaming with structured outputs to get real-time updates while ensuring type-safe responses:

#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative "../../lib/openai"

class Step < OpenAI::BaseModel
  required :explanation, String
  required :output, String
end

class MathResponse < OpenAI::BaseModel
  required :steps, OpenAI::ArrayOf[Step]
  required :final_answer, String
end

client = OpenAI::Client.new

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  response_format: MathResponse,
  messages: [
    {role: :user, content: "solve 8x + 31 = 2, show all steps"}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
    puts("--- parsed object ---")
    pp(event.parsed)
  end
end

response = stream.get_final_completion

puts
puts("----- parsed outputs from final response -----")
response
  .choices
  .each do |choice|
    # parsed is an instance of `MathResponse`
    pp(choice.message.parsed)
  end

Getting the Final Response

After streaming completes, you can access the full completion object:

response = stream.get_final_completion

This is useful when you need to access metadata or the complete parsed response after streaming.

Best Practices

Use the text helper for simple cases

If you only need the text content, use stream.text.each for cleaner code.

Handle events appropriately

Match on specific event types to handle different streaming scenarios like tool calls or structured outputs.

Always consume or close streams

Either iterate through the entire stream or call stream.close to prevent resource leaks.

Chat Completions

Responses API

Other Examples

Overview

Basic Streaming

Streaming Methods

Key Concepts

Stream Events

Stream Cleanup

Advanced Example: Streaming with Structured Outputs

Getting the Final Response

Best Practices

Next Steps

Structured Outputs

Function Calling

Build docs developers (and LLMs) love

Chat Completions

Responses API

Other Examples

​Overview

​Basic Streaming

​Streaming Methods

​Key Concepts

​Stream Events

​Stream Cleanup

​Advanced Example: Streaming with Structured Outputs

​Getting the Final Response

​Best Practices

​Next Steps

Structured Outputs

Function Calling

Build docs developers (and LLMs) love

Overview

Basic Streaming

Streaming Methods

Key Concepts

Stream Events

Stream Cleanup

Advanced Example: Streaming with Structured Outputs

Getting the Final Response

Best Practices

Next Steps