Skip to main content
The OpenAI Ruby SDK provides support for streaming responses using Server-Sent Events (SSE). This allows you to receive and process responses incrementally as they’re generated, rather than waiting for the complete response.

Basic Streaming

Use the create_streaming method to create a streaming request:
require "openai"

client = OpenAI::Client.new

stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

stream.each do |data|
  pp(data)
end
Calling #each on a stream will automatically clean up the connection, even if an error is thrown inside the block.

Early Exit from Stream

You can exit the stream loop early, and the connection will be properly cleaned up:
stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

stream.each do |data|
  pp(data)

  # Exit early if needed - stream will be cleaned up automatically
  if data.choices.size > 2
    pp("too many choices")
    break
  end
end

Manual Stream Management

If you don’t consume the stream via #each, you must manually close it to release the connection:
stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# If stream is not consumed, manually close it
stream.close
Once a stream has been exhausted or closed, no more chunks will be produced. Calling #each on an exhausted stream will not produce any output.

Stream as Enumerable

Streams implement the Enumerable interface, allowing you to use standard Ruby methods:
stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# Use Enumerable methods - this blocks until the stream is consumed
all_choices = stream
  .select { |completion| completion.object == :text_completion }
  .flat_map { |completion| completion.choices }

pp(all_choices)
Calling any Enumerable method will block until the entire stream is consumed and will automatically clean up the connection.

Lazy Stream Processing

For more efficient processing, use lazy evaluation to avoid blocking:
stream = client.completions.create_streaming(
  model: :"gpt-3.5-turbo-instruct",
  prompt: "1,2,3,",
  max_tokens: 5,
  temperature: 0.0
)

# Create a lazy enumerator - does not consume the stream yet
stream_of_choices = stream
  .lazy
  .select { |completion| completion.object == :text_completion }
  .flat_map { |completion| completion.choices }

# Stream is only consumed when you call a terminal operation
stream_of_choices.each do |choice|
  pp(choice)
end
If you create a lazy intermediary stream but don’t consume it, you must call stream.close to release the underlying connection.

Streaming with Responses API

You can also stream responses using the Responses API:
stream = client.responses.stream(
  input: "Write a haiku about OpenAI.",
  model: "gpt-5.2"
)

stream.each do |event|
  puts(event.type)
end

Best Practices

1

Always consume or close streams

Either iterate through the stream with #each or explicitly call #close to prevent connection leaks.
2

Use lazy evaluation for large streams

When processing large streams with transformations, use .lazy to avoid loading everything into memory.
3

Handle errors gracefully

The stream will be cleaned up automatically even if an error occurs during iteration.

Build docs developers (and LLMs) love