Skip to main content

Overview

Streaming allows you to receive chat completion responses incrementally as they’re generated, rather than waiting for the entire response. This is essential for building responsive user interfaces and real-time applications.

Basic Streaming

The simplest way to stream chat completions is using the stream method with event handling:
#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative "../../lib/openai"

# gets API Key from environment variable `OPENAI_API_KEY`
client = OpenAI::Client.new

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  messages: [
    {role: :user, content: "Write a creative haiku about the ocean."}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
  end
end

Streaming Methods

The event-based approach gives you fine-grained control over different event types:
stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  messages: [
    {role: :user, content: "Write a creative haiku about the ocean."}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
  end
end

Key Concepts

Stream Events

When using event-based streaming, you’ll encounter different event types:
  • ChatContentDeltaEvent - Contains incremental content as it’s generated
  • ChatContentDoneEvent - Signals that content generation is complete
  • ChatFunctionToolCallArgumentsDeltaEvent - Contains incremental tool call arguments
  • ChatFunctionToolCallArgumentsDoneEvent - Signals tool call arguments are complete

Stream Cleanup

Streams are automatically cleaned up when you iterate through them with each. If you don’t consume the stream, make sure to call stream.close manually to free resources.

Advanced Example: Streaming with Structured Outputs

You can combine streaming with structured outputs to get real-time updates while ensuring type-safe responses:
#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative "../../lib/openai"

class Step < OpenAI::BaseModel
  required :explanation, String
  required :output, String
end

class MathResponse < OpenAI::BaseModel
  required :steps, OpenAI::ArrayOf[Step]
  required :final_answer, String
end

client = OpenAI::Client.new

stream = client.chat.completions.stream(
  model: "gpt-4o-mini",
  response_format: MathResponse,
  messages: [
    {role: :user, content: "solve 8x + 31 = 2, show all steps"}
  ]
)

stream.each do |event|
  case event
  when OpenAI::Streaming::ChatContentDeltaEvent
    print(event.delta)
  when OpenAI::Streaming::ChatContentDoneEvent
    puts
    puts("--- parsed object ---")
    pp(event.parsed)
  end
end

response = stream.get_final_completion

puts
puts("----- parsed outputs from final response -----")
response
  .choices
  .each do |choice|
    # parsed is an instance of `MathResponse`
    pp(choice.message.parsed)
  end

Getting the Final Response

After streaming completes, you can access the full completion object:
response = stream.get_final_completion
This is useful when you need to access metadata or the complete parsed response after streaming.

Best Practices

1

Use the text helper for simple cases

If you only need the text content, use stream.text.each for cleaner code.
2

Handle events appropriately

Match on specific event types to handle different streaming scenarios like tool calls or structured outputs.
3

Always consume or close streams

Either iterate through the entire stream or call stream.close to prevent resource leaks.

Next Steps

Structured Outputs

Learn how to get type-safe, structured responses from chat completions

Function Calling

Stream tool calls and function arguments in real-time

Build docs developers (and LLMs) love