Skip to main content

Overview

The Codex responses endpoints provide access to AI models through Codex-LB’s load balancing and account pooling infrastructure. These endpoints support both streaming and compact (non-streaming) response formats.

Endpoints

POST /backend-api/codex/responses

Creates a streaming AI response using Server-Sent Events (SSE). Base URL: https://your-codex-lb-instance.com

Request Body

model
string
required
The model ID to use for the request (e.g., gpt-5.1, gpt-4o)
instructions
string
required
System-level instructions or prompt for the model
input
string | array
required
User input as a string or array of message objects
tools
array
default:"[]"
Array of tool definitions for function calling
tool_choice
string | object
Controls which tool the model should use (auto, none, or specific tool)
parallel_tool_calls
boolean
Whether to enable parallel tool calls
reasoning
object
Reasoning configuration with effort and summary options
text
object
Text output controls
stream
boolean
default:"true"
Whether to stream the response
include
array
default:"[]"
Additional fields to include in the response. Allowed values:
  • code_interpreter_call.outputs
  • computer_call_output.output.image_url
  • file_search_call.results
  • message.input_image.image_url
  • message.output_text.logprobs
  • reasoning.encrypted_content
  • web_search_call.action.sources
conversation
string
Conversation ID for multi-turn conversations
prompt_cache_key
string
Optional cache key for prompt caching

Response

Returns a Server-Sent Events (SSE) stream with events:
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}}

data: {"type":"response.output_item.added","item":{"type":"message","status":"in_progress"}}

data: {"type":"response.output_item.done","item":{"type":"message","status":"completed","content":[{"type":"output_text","text":"Hello! How can I help you?"}]}}

data: {"type":"response.completed","response":{"id":"resp_abc123","status":"completed","usage":{"input_tokens":10,"output_tokens":8,"total_tokens":18}}}

Example Request

curl -X POST https://your-codex-lb-instance.com/backend-api/codex/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of France?",
    "stream": true,
    "reasoning": {
      "effort": "medium"
    }
  }'

POST /backend-api/codex/responses/compact

Creates a non-streaming AI response that returns a complete response object. Base URL: https://your-codex-lb-instance.com

Request Body

model
string
required
The model ID to use for the request
instructions
string
required
System-level instructions for the model
input
string | array
required
User input as a string or array of message objects

Response

Returns a complete response object:
{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "content": [
        {
          "type": "output_text",
          "text": "Paris is the capital of France."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 8,
    "total_tokens": 20
  }
}

Example Request

curl -X POST https://your-codex-lb-instance.com/backend-api/codex/responses/compact \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of France?"
  }'

Reasoning Effort Parameter

The reasoning.effort parameter controls the depth of reasoning for models that support it:
  • low - Fast, minimal reasoning
  • medium - Balanced reasoning and speed (default)
  • high - Maximum reasoning depth
Available reasoning levels vary by model. Check the models endpoint to see supported reasoning levels for each model.

Error Handling

error
object
Error object returned when the request fails

Common Error Codes

  • no_accounts - No available accounts in the pool
  • rate_limit_exceeded - Rate limit reached for your API key
  • model_not_found - Requested model is not available
  • invalid_request_error - Invalid request parameters
  • upstream_error - Error from upstream AI provider

Notes

  • The /backend-api/codex/responses endpoint always returns streaming responses
  • Use /backend-api/codex/responses/compact for simple, non-streaming responses
  • Both endpoints support the same authentication mechanism
  • Streaming responses use the Server-Sent Events (SSE) protocol
  • The store parameter is not supported and must be false
  • The previous_response_id parameter is not supported
  • Unsupported tool types: file_search, code_interpreter, computer_use, image_generation

Build docs developers (and LLMs) love