Skip to main content
The OpenAI Ruby SDK is designed for concurrent usage with built-in connection pooling and thread safety guarantees.

Thread Safety

Client instances are thread-safe and can be shared across multiple threads:
require "openai"

# Create a single client instance
client = OpenAI::Client.new

# Safe to use from multiple threads
threads = 10.times.map do |i|
  Thread.new do
    response = client.chat.completions.create(
      model: "gpt-5.2",
      messages: [{role: "user", content: "Hello from thread #{i}"}]
    )
    puts response.choices.first.message.content
  end
end

threads.each(&:join)
OpenAI::Client instances are thread-safe but are only fork-safe when there are no in-flight HTTP requests.

Connection Pooling

Each client instance maintains its own HTTP connection pool:
client = OpenAI::Client.new

# This client has a connection pool with 99 connections by default

Default Pool Size

The default connection pool size is calculated as:
DEFAULT_MAX_CONNECTIONS = [Etc.nprocessors, 99].max
This means:
  • The pool size is at least 99 connections
  • On systems with more than 99 processors, the pool matches the processor count
The connection pool size of 99 is sufficient for most applications, even those handling high concurrency.

Connection Pool Behavior

When all connections are in use:
# If all 99 connections are busy, requests wait for an available connection
client = OpenAI::Client.new

# These requests share the connection pool
100.times.map do |i|
  Thread.new do
    # Requests 1-99 execute immediately
    # Request 100 waits for a connection to become available
    client.chat.completions.create(
      model: "gpt-5.2",
      messages: [{role: "user", content: "Request #{i}"}]
    )
  end
end.each(&:join)
Queue time waiting for an available connection counts towards the request timeout.

Single Client Instance Pattern

Create one client instance per application and share it:
# config/initializers/openai.rb (Rails example)
module MyApp
  # Single shared client instance
  OPENAI_CLIENT = OpenAI::Client.new(
    api_key: ENV['OPENAI_API_KEY']
  )
end

# app/services/chat_service.rb
class ChatService
  def self.complete(message)
    # Reuse the shared client
    MyApp::OPENAI_CLIENT.chat.completions.create(
      model: "gpt-5.2",
      messages: [{role: "user", content: message}]
    )
  end
end
Reusing a single client instance is recommended because each instance maintains its own connection pool of 99 connections.

Multi-threaded Examples

require "openai"
require "concurrent-ruby"

client = OpenAI::Client.new

# Process items with a thread pool
pool = Concurrent::FixedThreadPool.new(10)
items = (1..100).to_a

futures = items.map do |item|
  Concurrent::Future.execute(executor: pool) do
    client.chat.completions.create(
      model: "gpt-5.2",
      messages: [{role: "user", content: "Process item #{item}"}]
    )
  end
end

# Wait for all to complete
results = futures.map(&:value)
pool.shutdown
pool.wait_for_termination

Fork Safety

Client instances are fork-safe only when there are no active requests:
client = OpenAI::Client.new

# Safe: No active requests
pid = fork do
  # Child process can use client
  response = client.chat.completions.create(
    model: "gpt-5.2",
    messages: [{role: "user", content: "Hello from child"}]
  )
  puts response.choices.first.message.content
end

Process.wait(pid)
Do not fork while requests are in flight. Create new client instances in child processes if needed.

Forking Web Servers

For forking web servers (Puma, Unicorn), create clients after forking:
# config/puma.rb
on_worker_boot do
  # Create fresh client instance in each worker
  MyApp::OPENAI_CLIENT = OpenAI::Client.new(
    api_key: ENV['OPENAI_API_KEY']
  )
end

Request-Level Thread Safety

Most SDK classes are not thread-safe at the instance level:
# DON'T: Share request-level objects across threads
stream = client.completions.create_streaming(...)

# DON'T do this:
Thread.new { stream.each { |chunk| puts chunk } }
Thread.new { stream.each { |chunk| puts chunk } }
# DO: Use separate streams per thread
threads = 2.times.map do
  Thread.new do
    # Each thread gets its own stream
    stream = client.completions.create_streaming(...)
    stream.each { |chunk| puts chunk }
  end
end

threads.each(&:join)
Unless otherwise specified, classes other than OpenAI::Client do not have locks protecting their underlying data structures.

Connection Timeout Configuration

Configure timeouts to handle connection pool exhaustion:
# Default timeout is 600 seconds
client = OpenAI::Client.new(timeout: 30)

# Per-request timeout
response = client.chat.completions.create(
  model: "gpt-5.2",
  messages: [{role: "user", content: "Hello"}],
  request_options: {timeout: 10}
)

Best Practices

1

Create one client instance per application

Each client has a pool of 99 connections. Creating multiple clients wastes resources.
2

Share client instances across threads

OpenAI::Client is thread-safe and designed for concurrent usage.
3

Don't share request-level objects

Objects like streams and responses should not be shared across threads.
4

Handle connection pool exhaustion

If you need more than 99 concurrent requests, consider creating multiple client instances or implementing request queuing.
5

Be cautious with forking

Only fork when there are no active requests, or create new client instances in child processes.

Performance Monitoring

Monitor concurrent request performance:
require "openai"
require "benchmark"

client = OpenAI::Client.new

# Measure concurrent request performance
time = Benchmark.measure do
  threads = 50.times.map do |i|
    Thread.new do
      client.chat.completions.create(
        model: "gpt-5.2",
        messages: [{role: "user", content: "Request #{i}"}]
      )
    end
  end
  threads.each(&:join)
end

puts "Completed 50 concurrent requests in #{time.real} seconds"
puts "Average: #{time.real / 50} seconds per request"

Debugging Connection Pool Issues

require "openai"
require "logger"

# Enable debugging
logger = Logger.new(STDOUT)
logger.level = Logger::DEBUG

client = OpenAI::Client.new

# Monitor for timeout errors that might indicate pool exhaustion
begin
  response = client.chat.completions.create(
    model: "gpt-5.2",
    messages: [{role: "user", content: "Hello"}],
    request_options: {timeout: 5}
  )
rescue OpenAI::Errors::APITimeoutError => e
  logger.error "Request timed out - possible pool exhaustion"
  logger.error "Queue time may have exceeded timeout"
rescue StandardError => e
  logger.error "Request failed: #{e.message}"
end

Build docs developers (and LLMs) love