Overview
The Responses API is the primary interface for generating model outputs. It supports text and image inputs, structured outputs, function calling, built-in tools (web search, file search, code interpreter), and both streaming and non-streaming responses.
Create response
Creates a model response from text, image, or file inputs.
response = client. responses . create (
model: "gpt-4o" ,
input: "What is the capital of France?"
)
puts response. output . first . content . first . text
Parameters
Model ID like gpt-4o, gpt-4o-mini, o3-mini, or o1. See the model guide for available options.
Input to the model. Can be:
A simple string
An array of message objects with role and content
An array of input items (messages, tool calls, reasoning, etc.)
System message prepended to the model’s context (max 256,000 characters)
Sampling temperature between 0 and 2. Higher = more random (default: 1)
Maximum tokens to generate in the response
Tools the model can use:
{ type: "file_search" } - Search uploaded files
{ type: "code_interpreter" } - Run Python code
{ type: "web_search" } - Search the web
{
type: "function" ,
function: {
name: "get_weather" ,
description: "Get current weather" ,
parameters: { ... }
}
}
How to select tools:
auto (default) - Model decides
none - No tools
required - Must use a tool
{ type: "function", function: { name: "..." } } - Force specific function
Text output configuration: text: {
format: {
type: "json_schema" ,
strict: true ,
name: "Recipe" ,
schema: {
type: "object" ,
properties: {
name: { type: "string" },
ingredients: { type: "array" , items: { type: "string" } }
},
required: [ "name" , "ingredients" ]
}
}
}
Conversation ID to continue, or object with id to create/reference
ID of previous response to continue from
Whether to store the response for later retrieval (default: false)
Optional metadata (up to 16 key-value pairs)
Reasoning configuration for o-series and gpt-5 models: Reasoning effort: low, medium, or high
Summary mode: auto, concise, or detailed
Nucleus sampling parameter (0-1, alternative to temperature)
Allow parallel tool execution (default: true)
Response
Unique response identifier
Unix timestamp of creation
Model used for generation
Array of output items (messages, tool calls, reasoning) Array of content parts (text, images, etc.)
Response status: completed, incomplete, failed, etc.
Stream response
Creates a streaming response for real-time output.
stream = client. responses . stream (
model: "gpt-4o" ,
input: "Write a haiku about coding"
)
stream. on_text_delta do | delta |
print delta
end
stream. run
Stream helpers
# Handle different event types
stream. on_text_delta { | delta | print delta }
stream. on_reasoning_delta { | delta | puts "[thinking: #{ delta } ]" }
stream. on_function_call { | name , args | handle_function (name, args) }
stream. on_completed { | response | puts " \n Done: #{ response. id } " }
stream. on_error { | error | puts "Error: #{ error } " }
Retrieve response
Retrieves a stored response by ID.
response = client. responses . retrieve ( "resp_abc123" )
Parameters
ID of the response to retrieve
Additional fields to include: reasoning, audio
Delete response
Deletes a stored response.
client. responses . delete ( "resp_abc123" )
Parameters
ID of the response to delete
Cancel response
Cancels a background response.
client. responses . cancel ( "resp_abc123" )
Parameters
ID of the response to cancel
Compact conversation
Compacts a long conversation to fit within context limits.
compacted = client. responses . compact (
model: "gpt-4o" ,
previous_response_id: "resp_abc123"
)
Examples
Basic text generation
response = client. responses . create (
model: "gpt-4o" ,
input: "Explain quantum computing in simple terms" ,
max_output_tokens: 500
)
puts response. output . first . content . first . text
stream = client. responses . stream (
model: "gpt-4o" ,
input: "What's the weather in San Francisco?" ,
tools: [{ type: "web_search" }]
)
stream. on_web_search_call do | query |
puts "Searching: #{ query } "
end
stream. on_text_delta { | delta | print delta }
stream. run
Structured output
response = client. responses . create (
model: "gpt-4o" ,
input: "Generate a recipe for chocolate chip cookies" ,
text: {
format: {
type: "json_schema" ,
strict: true ,
name: "Recipe" ,
schema: {
type: "object" ,
properties: {
name: { type: "string" },
ingredients: {
type: "array" ,
items: { type: "string" }
},
steps: {
type: "array" ,
items: { type: "string" }
}
},
required: [ "name" , "ingredients" , "steps" ]
}
}
}
)
recipe = JSON . parse (response. output . first . content . first . text )
Multi-turn conversation
# First turn
response1 = client. responses . create (
model: "gpt-4o" ,
input: "What is the capital of France?" ,
store: true
)
# Second turn
response2 = client. responses . create (
model: "gpt-4o" ,
input: "What is its population?" ,
previous_response_id: response1. id
)
Function calling
response = client. responses . create (
model: "gpt-4o" ,
input: "What's the weather in Tokyo?" ,
tools: [
{
type: "function" ,
function: {
name: "get_weather" ,
description: "Get current weather for a location" ,
parameters: {
type: "object" ,
properties: {
location: { type: "string" }
},
required: [ "location" ]
}
}
}
]
)
# Check for function calls
response. output . each do | item |
if item. type == "function_call"
puts "Function: #{ item. name } "
puts "Arguments: #{ item. arguments } "
end
end