Streaming

The OpenAI Ruby SDK provides support for streaming responses using Server-Sent Events (SSE). This allows you to receive and process responses incrementally as they’re generated, rather than waiting for the complete response.

Basic Streaming

Use the create_streaming method to create a streaming request:

require "openai"

client = OpenAI::Client.new

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

stream.each do |data|
  pp(data)
end

Calling #each on a stream will automatically clean up the connection, even if an error is thrown inside the block.

Early Exit from Stream

You can exit the stream loop early, and the connection will be properly cleaned up:

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

stream.each do |data|
  pp(data)

  # Exit early if needed - stream will be cleaned up automatically
  if data.choices.size > 2
    pp("too many choices")
    break
  end
end

Manual Stream Management

If you don’t consume the stream via #each, you must manually close it to release the connection:

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# If stream is not consumed, manually close it
stream.close

Once a stream has been exhausted or closed, no more chunks will be produced. Calling #each on an exhausted stream will not produce any output.

Stream as Enumerable

Streams implement the Enumerable interface, allowing you to use standard Ruby methods:

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# Use Enumerable methods - this blocks until the stream is consumed
all_choices = stream
  .select { |completion| completion.object == :text_completion }
  .flat_map { |completion| completion.choices }

pp(all_choices)

Calling any Enumerable method will block until the entire stream is consumed and will automatically clean up the connection.

Lazy Stream Processing

For more efficient processing, use lazy evaluation to avoid blocking:

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# Create a lazy enumerator - does not consume the stream yet
stream_of_choices = stream
  .lazy
  .select { |completion| completion.object == :text_completion }
  .flat_map { |completion| completion.choices }

# Stream is only consumed when you call a terminal operation
stream_of_choices.each do |choice|
  pp(choice)
end

If you create a lazy intermediary stream but don’t consume it, you must call stream.close to release the underlying connection.

Streaming with Responses API

You can also stream responses using the Responses API:

stream = client.responses.stream(
  input: "Write a haiku about OpenAI.",
  model: "gpt-5.2"
)

stream.each do |event|
  puts(event.type)
end

Best Practices

Always consume or close streams

Either iterate through the stream with #each or explicitly call #close to prevent connection leaks.

Use lazy evaluation for large streams

When processing large streams with transformations, use .lazy to avoid loading everything into memory.

Handle errors gracefully

The stream will be cleaned up automatically even if an error occurs during iteration.

Get Started

Core Concepts

Advanced Features

Type Safety

Basic Streaming

Early Exit from Stream

Manual Stream Management

Stream as Enumerable

Lazy Stream Processing

Streaming with Responses API

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Advanced Features

Type Safety

​Basic Streaming

​Early Exit from Stream

​Manual Stream Management

​Stream as Enumerable

​Lazy Stream Processing

​Streaming with Responses API

​Best Practices

Build docs developers (and LLMs) love

Basic Streaming

Early Exit from Stream

Manual Stream Management

Stream as Enumerable

Lazy Stream Processing

Streaming with Responses API

Best Practices