Skip to main content

Overview

The Responses API is the primary interface for generating model outputs. It supports text and image inputs, structured outputs, function calling, built-in tools (web search, file search, code interpreter), and both streaming and non-streaming responses.

Create response

Creates a model response from text, image, or file inputs.
response = client.responses.create(
  model: "gpt-4o",
  input: "What is the capital of France?"
)

puts response.output.first.content.first.text

Parameters

model
string
Model ID like gpt-4o, gpt-4o-mini, o3-mini, or o1. See the model guide for available options.
input
string | array
Input to the model. Can be:
  • A simple string
  • An array of message objects with role and content
  • An array of input items (messages, tool calls, reasoning, etc.)
instructions
string
System message prepended to the model’s context (max 256,000 characters)
temperature
float
Sampling temperature between 0 and 2. Higher = more random (default: 1)
max_output_tokens
integer
Maximum tokens to generate in the response
tools
array
Tools the model can use:
tool_choice
string | object
How to select tools:
  • auto (default) - Model decides
  • none - No tools
  • required - Must use a tool
  • { type: "function", function: { name: "..." } } - Force specific function
text
object
Text output configuration:
conversation
string | object
Conversation ID to continue, or object with id to create/reference
previous_response_id
string
ID of previous response to continue from
store
boolean
Whether to store the response for later retrieval (default: false)
metadata
hash
Optional metadata (up to 16 key-value pairs)
reasoning
object
Reasoning configuration for o-series and gpt-5 models:
top_p
float
Nucleus sampling parameter (0-1, alternative to temperature)
parallel_tool_calls
boolean
Allow parallel tool execution (default: true)

Response

id
string
Unique response identifier
object
string
Object type: response
created_at
integer
Unix timestamp of creation
model
string
Model used for generation
output
array
Array of output items (messages, tool calls, reasoning)
usage
object
Token usage information
status
string
Response status: completed, incomplete, failed, etc.

Stream response

Creates a streaming response for real-time output.
stream = client.responses.stream(
  model: "gpt-4o",
  input: "Write a haiku about coding"
)

stream.on_text_delta do |delta|
  print delta
end

stream.run

Stream helpers

# Handle different event types
stream.on_text_delta { |delta| print delta }
stream.on_reasoning_delta { |delta| puts "[thinking: #{delta}]" }
stream.on_function_call { |name, args| handle_function(name, args) }
stream.on_completed { |response| puts "\nDone: #{response.id}" }
stream.on_error { |error| puts "Error: #{error}" }

Retrieve response

Retrieves a stored response by ID.
response = client.responses.retrieve("resp_abc123")

Parameters

response_id
string
required
ID of the response to retrieve
include
array
Additional fields to include: reasoning, audio

Delete response

Deletes a stored response.
client.responses.delete("resp_abc123")

Parameters

response_id
string
required
ID of the response to delete

Cancel response

Cancels a background response.
client.responses.cancel("resp_abc123")

Parameters

response_id
string
required
ID of the response to cancel

Compact conversation

Compacts a long conversation to fit within context limits.
compacted = client.responses.compact(
  model: "gpt-4o",
  previous_response_id: "resp_abc123"
)

Examples

Basic text generation

response = client.responses.create(
  model: "gpt-4o",
  input: "Explain quantum computing in simple terms",
  max_output_tokens: 500
)

puts response.output.first.content.first.text

Streaming with tool use

stream = client.responses.stream(
  model: "gpt-4o",
  input: "What's the weather in San Francisco?",
  tools: [{ type: "web_search" }]
)

stream.on_web_search_call do |query|
  puts "Searching: #{query}"
end

stream.on_text_delta { |delta| print delta }
stream.run

Structured output

response = client.responses.create(
  model: "gpt-4o",
  input: "Generate a recipe for chocolate chip cookies",
  text: {
    format: {
      type: "json_schema",
      strict: true,
      name: "Recipe",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          ingredients: {
            type: "array",
            items: { type: "string" }
          },
          steps: {
            type: "array",
            items: { type: "string" }
          }
        },
        required: ["name", "ingredients", "steps"]
      }
    }
  }
)

recipe = JSON.parse(response.output.first.content.first.text)

Multi-turn conversation

# First turn
response1 = client.responses.create(
  model: "gpt-4o",
  input: "What is the capital of France?",
  store: true
)

# Second turn
response2 = client.responses.create(
  model: "gpt-4o",
  input: "What is its population?",
  previous_response_id: response1.id
)

Function calling

response = client.responses.create(
  model: "gpt-4o",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" }
          },
          required: ["location"]
        }
      }
    }
  ]
)

# Check for function calls
response.output.each do |item|
  if item.type == "function_call"
    puts "Function: #{item.name}"
    puts "Arguments: #{item.arguments}"
  end
end

Build docs developers (and LLMs) love