Skip to main content

Request Transformation Pipeline

Codex Multi-Auth transforms every OpenAI SDK request through a 7-step pipeline before sending to the Codex API.

Pipeline Overview

OpenAI SDK Call (generateText/streamText)
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 1: URL Rewriting                                      │
│   /v1/responses → /v1/realtime/responses                   │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 2: Account Selection                                  │
│   - Filter cooldowns & rate limits                         │
│   - Apply session affinity                                 │
│   - Score by health + quota                                │
│   - Select best account                                    │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 3: Model Normalization                                │
│   gpt-5.3-codex → gpt-5-codex                              │
│   openai/gpt-5-codex → gpt-5-codex                         │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 4: Body Transformation                                │
│   - Inject model-family instructions                       │
│   - Set store: false, stream: true                         │
│   - Configure reasoning & text verbosity                   │
│   - Add reasoning.encrypted_content to include             │
│   - Filter orphaned tool outputs                           │
│   - Apply fast-session optimizations                       │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 5: Header Injection                                   │
│   - Authorization: Bearer <access_token>                   │
│   - openai-account-id: <account_id>                        │
│   - openai-beta: realtime-responses-2024-11-19             │
│   - openai-originator: codex_cli_rs                        │
│   - openai-conversation-id: <prompt_cache_key>             │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 6: Execute Request                                    │
│   - Set fetch timeout (default 2 minutes)                  │
│   - Enable stream stall detection (default 45s)            │
│   - Apply circuit breaker                                  │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 7: Response Handling                                  │
│   - SSE → JSON for generateText (non-streaming)            │
│   - Pass-through SSE for streamText                        │
│   - Extract rate limit info from headers                   │
│   - Update account state (cooldowns, rate limits)          │
└────────────────────────────────────────────────────────────┘
  |
  v
Response to SDK

Step 1: URL Rewriting

From lib/request/fetch-helpers.ts:381:
export function rewriteUrlForCodex(url: string): string {
  const parsedUrl = new URL(url);
  
  // Rewrite /v1/responses to /v1/realtime/responses
  const rewrittenPath = parsedUrl.pathname.includes("/v1/responses")
    ? parsedUrl.pathname.replace("/v1/responses", "/v1/realtime/responses")
    : parsedUrl.pathname;
  
  // Ensure base path prefix
  const normalizedPath = rewrittenPath.startsWith("/v1/realtime/")
    ? rewrittenPath
    : `/v1/realtime${rewrittenPath}`;
  
  // Update to Codex base URL
  parsedUrl.protocol = "https:";
  parsedUrl.host = "api.openai.com";
  parsedUrl.pathname = normalizedPath;
  
  return parsedUrl.toString();
}
Example:
Input:  https://api.openai.com/v1/responses
Output: https://api.openai.com/v1/realtime/responses

Step 2: Account Selection

From lib/accounts.ts and index.ts:1372:
// 1. Filter accounts
const now = Date.now();
const available = accounts.filter((account, index) => {
  // Skip if in cooldown
  if (account.cooldownUntil && account.cooldownUntil > now) {
    return false;
  }
  
  // Skip if rate limited for this model family
  const resetTime = getRateLimitResetTimeForFamily(account, now, modelFamily);
  if (resetTime && resetTime > now) {
    return false;
  }
  
  // Skip if circuit breaker is open
  const breaker = getCircuitBreaker(`account:${index}`);
  if (breaker.getState() === "open") {
    return false;
  }
  
  return true;
});

// 2. Apply session affinity
const preferredIndex = sessionAffinityStore.getPreferredAccountIndex(threadId);
if (preferredIndex !== undefined && available.includes(accounts[preferredIndex])) {
  // Prefer same account for this conversation
  selectedIndex = preferredIndex;
} else {
  // 3. Score by health + quota + capability + preemptive quota
  const scores = available.map((account, index) => {
    let score = account.healthScore ?? 100;
    
    // Boost for capability policy
    score += capabilityPolicyStore.getAccountScore(index, model) * 0.1;
    
    // Reduce for preemptive quota deferral
    const deferralMs = preemptiveQuotaScheduler.shouldDeferRequest(index, modelFamily);
    if (deferralMs > 0) {
      score -= 50;
    }
    
    // PID offset for fair rotation
    if (pidOffsetEnabled) {
      score += (index * 0.001);
    }
    
    return { index, score };
  });
  
  // 4. Select highest score
  scores.sort((a, b) => b.score - a.score);
  selectedIndex = scores[0]?.index ?? 0;
}
Selection Factors:
  1. Cooldown status: Skip accounts with active cooldown
  2. Rate limit status: Skip accounts with rate limits for this model family
  3. Circuit breaker: Skip accounts with open circuit breaker
  4. Session affinity: Prefer same account for same conversation
  5. Health score: 0-100, decrements on failure, resets on success
  6. Capability score: Boost for accounts that support this model
  7. Quota deferral: Reduce score if quota is low
  8. PID offset: Tiny offset for deterministic fair rotation

Step 3: Model Normalization

From lib/request/request-transformer.ts:40:
export function normalizeModel(model: string | undefined): string {
  if (!model) return "gpt-5.1";
  
  // Strip provider prefix (openai/gpt-5-codex → gpt-5-codex)
  const modelId = model.includes("/") ? model.split("/").pop() ?? model : model;
  
  // Explicit model map (handles known variants)
  const mappedModel = getNormalizedModel(modelId);
  if (mappedModel) return mappedModel;
  
  // Pattern-based fallback
  const normalized = modelId.toLowerCase();
  
  // Legacy aliases
  if (normalized.includes("gpt-5.3-codex-spark")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.3-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.2-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.1-codex")) return "gpt-5-codex";
  
  // Canonical Codex models
  if (normalized.includes("gpt-5-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.1-codex-max")) return "gpt-5.1-codex-max";
  if (normalized.includes("gpt-5.1-codex-mini")) return "gpt-5.1-codex-mini";
  
  // GPT-5 variants
  if (normalized.includes("gpt-5.2")) return "gpt-5.2";
  if (normalized.includes("gpt-5.1")) return "gpt-5.1";
  if (normalized.includes("gpt-5")) return "gpt-5.1";
  
  return "gpt-5.1"; // Default fallback
}
Normalization Examples:
openai/gpt-5-codex          → gpt-5-codex
gpt-5.3-codex-spark         → gpt-5-codex
gpt-5.2-codex               → gpt-5-codex
gpt-5.1-codex               → gpt-5-codex
gpt-5-codex-low             → gpt-5-codex (variant stripped for API)
gpt-5.1-codex-max           → gpt-5.1-codex-max
gpt-5.1-codex-mini          → gpt-5.1-codex-mini
gpt-5.2                     → gpt-5.2
gpt-5.1                     → gpt-5.1

Step 4: Body Transformation

From lib/request/request-transformer.ts:821:
export async function transformRequestBody(
  body: RequestBody,
  codexInstructions: string,
  userConfig: UserConfig = { global: {}, models: {} },
  codexMode = true,
  fastSession = false,
  fastSessionStrategy: FastSessionStrategy = "hybrid",
  fastSessionMaxInputItems = 30,
): Promise<RequestBody> {
  const originalModel = body.model;
  const normalizedModel = normalizeModel(body.model);
  const modelConfig = getModelConfig(originalModel || normalizedModel, userConfig);
  
  // Set normalized model
  body.model = normalizedModel;
  
  // Codex required fields
  body.store = false;        // Stateless (required by ChatGPT backend)
  body.stream = true;        // Always stream (SSE)
  
  // Inject Codex instructions
  body.instructions = shouldApplyFastSessionTuning
    ? compactInstructionsForFastSession(codexInstructions, isTrivialTurn)
    : codexInstructions;
  
  // Filter input array
  if (body.input) {
    // Apply fast-session input trimming
    if (fastSession) {
      body.input = trimInputForFastSession(
        body.input,
        fastSessionMaxInputItems,
        { preferLatestUserOnly: isTrivialTurn }
      );
    }
    
    // Remove item_reference (AI SDK construct, not supported by Codex)
    // Strip IDs from all items (stateless mode)
    body.input = filterInput(body.input);
    
    // Add bridge/tool-remap message
    if (codexMode) {
      body.input = await filterHostSystemPrompts(body.input);
      body.input = addCodexBridgeMessage(body.input, !!body.tools);
    } else {
      body.input = addToolRemapMessage(body.input, !!body.tools);
    }
    
    // Handle orphaned tool outputs
    body.input = normalizeOrphanedToolOutputs(body.input);
    body.input = injectMissingToolOutputs(body.input);
  }
  
  // Configure reasoning
  const reasoningConfig = resolveReasoningConfig(normalizedModel, modelConfig, body);
  body.reasoning = {
    ...body.reasoning,
    ...reasoningConfig,
  };
  
  // Fast-session overrides
  if (fastSession && shouldApplyFastSessionTuning) {
    body.reasoning.effort = "none"; // or "low" for Codex models
    body.reasoning.summary = "auto";
    body.text.verbosity = "low";
  }
  
  // Configure text verbosity
  body.text = {
    ...body.text,
    verbosity: resolveTextVerbosity(modelConfig, body),
  };
  
  // Add include for encrypted reasoning content
  body.include = resolveInclude(modelConfig, body);
  // Always includes "reasoning.encrypted_content" for stateless continuity
  
  // Remove unsupported parameters
  body.max_output_tokens = undefined;
  body.max_completion_tokens = undefined;
  
  return body;
}

Key Transformations

1. Stateless Mode (store: false)

From ARCHITECTURE.md:73:
ChatGPT backend requires store: false, include reasoning.encrypted_content.
Why stateless?
  • Codex API doesn’t persist conversation state
  • Requires full context in each request
  • reasoning.encrypted_content maintains reasoning continuity

2. Input Filtering

From lib/request/request-transformer.ts:542:
export function filterInput(
  input: InputItem[] | undefined,
): InputItem[] | undefined {
  if (!Array.isArray(input)) return input;
  
  return input
    .filter((item) => {
      // Remove AI SDK constructs not supported by Codex API
      if (item.type === "item_reference") {
        return false; // AI SDK only - references server state
      }
      return true; // Keep all other items
    })
    .map((item) => {
      // Strip IDs from all items (Codex API stateless mode)
      if (item.id) {
        const { id: _omit, ...itemWithoutId } = item;
        return itemWithoutId as InputItem;
      }
      return item;
    });
}
Why remove item_reference?
  • AI SDK uses this for server-side state lookup
  • Not supported by Codex API (stateless)
  • Would cause API errors
Why strip IDs?
  • Stateless mode doesn’t track item IDs
  • Reduces payload size
  • Prevents ID conflicts

3. Orphaned Tool Outputs

From lib/request/helpers/input-utils.ts:180:
export function normalizeOrphanedToolOutputs(
  input: InputItem[],
): InputItem[] {
  // Problem: function_call_output references a function_call that was
  // an item_reference (now filtered out). API rejects orphaned outputs.
  
  // Solution: Convert orphaned outputs to assistant messages to preserve
  // context without API errors.
  
  const functionCallIds = new Set<string>();
  for (const item of input) {
    if (item.type === "function_call" && item.call_id) {
      functionCallIds.add(item.call_id);
    }
  }
  
  return input.map((item) => {
    if (item.type === "function_call_output") {
      const callId = item.call_id;
      if (callId && !functionCallIds.has(callId)) {
        // Orphaned output - convert to message
        return {
          type: "message",
          role: "assistant",
          content: [
            {
              type: "input_text",
              text: `[Previous tool result: ${JSON.stringify(item.output)}]`,
            },
          ],
        } as InputItem;
      }
    }
    return item;
  });
}
Why this matters:
  • Prevents infinite loops (LLM loses tool results)
  • Preserves conversation context
  • Avoids API validation errors

4. Reasoning Configuration

From lib/request/request-transformer.ts:388:
export function getReasoningConfig(
  modelName: string | undefined,
  userConfig: ConfigOptions = {},
): ReasoningConfig {
  const normalizedName = modelName?.toLowerCase() ?? "";
  
  // Canonical GPT-5 Codex defaults to high, does not support "none"
  const isGpt5Codex = normalizedName.includes("gpt-5-codex");
  const isGpt52General = normalizedName.includes("gpt-5.2") && !isCodex;
  const isGpt51General = normalizedName.includes("gpt-5.1") && !isCodex;
  
  // GPT-5.2 general and GPT-5.1 general support "none" reasoning
  const supportsNone = isGpt52General || isGpt51General;
  
  // GPT-5.2 general supports xhigh
  const supportsXhigh = isGpt52General;
  
  // Default effort
  const defaultEffort = isGpt5Codex ? "high" : "medium";
  
  // Get user-requested effort
  let effort = userConfig.reasoningEffort || defaultEffort;
  
  // Downgrade unsupported values
  if (!supportsXhigh && effort === "xhigh") {
    effort = "high";
  }
  if (!supportsNone && effort === "none") {
    effort = "low";
  }
  
  const summary = userConfig.reasoningSummary ?? "auto";
  
  return { effort, summary };
}
Model-Specific Defaults:
gpt-5-codex:       effort: high,    supports: low, medium, high
gpt-5.1-codex-max: effort: high,    supports: low, medium, high, xhigh
gpt-5.1-codex-mini: effort: medium, supports: medium, high
gpt-5.2:           effort: high,    supports: none, low, medium, high, xhigh
gpt-5.1:           effort: medium,  supports: none, low, medium, high

5. Fast-Session Optimizations

From lib/request/request-transformer.ts:569:
export function trimInputForFastSession(
  input: InputItem[] | undefined,
  maxItems: number,
  options?: { preferLatestUserOnly?: boolean },
): InputItem[] | undefined {
  if (!Array.isArray(input)) return input;
  
  const safeMax = Math.max(8, Math.floor(maxItems)); // Default 30
  
  // Strategy 1: Trivial turns (short, simple questions)
  if (options?.preferLatestUserOnly && isTrivialLatestPrompt(latestUserText)) {
    // Keep only: minimal system prompt + latest user message
    return [firstSystemPrompt, latestUserMessage];
  }
  
  // Strategy 2: Complex requests (code blocks, lists, tables)
  // Keep: up to 2 leading system/developer messages + last N items
  const keepIndexes = new Set<number>();
  
  // Keep small leading system prompts (< 1200 chars)
  for (let i = 0; i < 2; i++) {
    const item = input[i];
    if (item?.role === "developer" || item?.role === "system") {
      const text = extractMessageText(item.content);
      if (text.length <= 1200) {
        keepIndexes.add(i);
      }
    }
  }
  
  // Keep last N items
  for (let i = Math.max(0, input.length - safeMax); i < input.length; i++) {
    keepIndexes.add(i);
  }
  
  return input.filter((_, index) => keepIndexes.has(index));
}
Fast-Session Benefits:
  • Lower latency: Smaller context = faster processing
  • Lower cost: Less tokens to process
  • Better UX: Instant responses for simple questions
When applied:
const shouldApplyFastSessionTuning = 
  fastSession &&
  (fastSessionStrategy === "always" ||
   !isComplexFastSessionRequest(body, maxInputItems));

Step 5: Header Injection

From lib/request/fetch-helpers.ts:505:
export function createCodexHeaders(
  init: RequestInit | undefined,
  accountId: string,
  accessToken: string,
  opts?: { model?: string; promptCacheKey?: string },
): Headers {
  const headers = new Headers(init?.headers ?? {});
  
  // Remove any existing API key
  headers.delete("x-api-key");
  
  // OAuth authentication
  headers.set("Authorization", `Bearer ${accessToken}`);
  
  // Account ID (org-* or user-*)
  headers.set("openai-account-id", accountId);
  
  // Realtime responses beta flag
  headers.set("openai-beta", "realtime-responses-2024-11-19");
  
  // Originator tag (identifies Codex CLI)
  headers.set("openai-originator", "codex_cli_rs");
  
  // Prompt caching (session affinity)
  const cacheKey = opts?.promptCacheKey;
  if (cacheKey) {
    headers.set("openai-conversation-id", cacheKey);
    headers.set("openai-session-id", cacheKey);
  } else {
    headers.delete("openai-conversation-id");
    headers.delete("openai-session-id");
  }
  
  // Accept SSE
  headers.set("accept", "text/event-stream");
  
  return headers;
}
Key Headers:
  • Authorization: OAuth bearer token
  • openai-account-id: Account/org ID (affects quotas)
  • openai-beta: Enable realtime responses API
  • openai-originator: Identifies plugin as Codex CLI
  • openai-conversation-id: Prompt cache key (optional)
  • openai-session-id: Session identifier (optional)

Step 6: Execute Request

From index.ts:1550:
const controller = new AbortController();
const fetchTimeoutMs = getFetchTimeoutMs(pluginConfig); // Default 120s

const timeoutId = setTimeout(() => {
  controller.abort();
}, fetchTimeoutMs);

try {
  // Execute with circuit breaker
  const breaker = getCircuitBreaker(`account:${selectedIndex}`);
  if (!breaker.canExecute()) {
    throw new CircuitOpenError();
  }
  
  const response = await fetch(url, {
    ...requestInit,
    headers: codexHeaders,
    signal: controller.signal,
  });
  
  clearTimeout(timeoutId);
  
  // Record circuit breaker success
  breaker.recordSuccess();
  
  return response;
} catch (error) {
  clearTimeout(timeoutId);
  
  // Record circuit breaker failure
  breaker.recordFailure();
  
  throw error;
}
Timeout Behavior:
  • Default: 120 seconds (2 minutes)
  • Configurable via CODEX_AUTH_FETCH_TIMEOUT_MS
  • Aborts request on timeout
  • Triggers failover to next account

Step 7: Response Handling

From lib/request/fetch-helpers.ts:589:
export async function handleSuccessResponse(
  response: Response,
  isStreaming: boolean,
  options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
  // Check for deprecation headers (RFC 8594)
  const deprecation = response.headers.get("Deprecation");
  const sunset = response.headers.get("Sunset");
  if (deprecation || sunset) {
    logWarn(`API deprecation notice`, { deprecation, sunset });
  }
  
  const responseHeaders = ensureContentType(response.headers);
  
  // For non-streaming requests (generateText), convert SSE to JSON
  if (!isStreaming) {
    return await convertSseToJson(response, responseHeaders, options);
  }
  
  // For streaming requests (streamText), return stream as-is
  return new Response(response.body, {
    status: response.status,
    statusText: response.statusText,
    headers: responseHeaders,
  });
}

SSE to JSON Conversion

From lib/request/response-handler.ts:47:
export async function convertSseToJson(
  response: Response,
  headers: Headers,
  options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
  if (!response.body) {
    return new Response("{}", { status: 200, headers });
  }
  
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
  let finalData: unknown = null;
  
  try {
    while (true) {
      const { done, value } = await readWithTimeout(
        reader,
        options?.streamStallTimeoutMs ?? 45_000,
      );
      
      if (done) break;
      
      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split("\n");
      buffer = lines.pop() ?? "";
      
      for (const line of lines) {
        if (line.startsWith("data: ")) {
          const data = line.slice(6);
          if (data === "[DONE]") continue;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.type === "response.done") {
              finalData = parsed.response;
              break;
            }
          } catch {
            continue;
          }
        }
      }
      
      if (finalData) break;
    }
  } finally {
    reader.releaseLock();
  }
  
  const json = finalData ?? {};
  return new Response(JSON.stringify(json), {
    status: 200,
    headers,
  });
}
SSE Event Format:
data: {"type":"response.started","response":{"id":"resp_abc123"}}
data: {"type":"response.output_text.delta","delta":"Hello"}
data: {"type":"response.output_text.delta","delta":" world"}
data: {"type":"response.done","response":{"id":"resp_abc123","output":[{"type":"text","text":"Hello world"}]}}
data: [DONE]
Conversion Result:
{
  "id": "resp_abc123",
  "output": [
    {
      "type": "text",
      "text": "Hello world"
    }
  ]
}

Performance Optimizations

1. Model-Family Instruction Caching

From lib/prompts/codex.ts:180:
const instructionsCache = new Map<ModelFamily, string>();
const etagCache = new Map<ModelFamily, string>();

export async function getCodexInstructions(
  model: string,
): Promise<string> {
  const family = getModelFamily(model);
  
  // Return cached instructions if available
  if (instructionsCache.has(family)) {
    return instructionsCache.get(family)!;
  }
  
  // Fetch from GitHub with ETag caching
  const url = CODEX_INSTRUCTIONS_URLS[family];
  const etag = etagCache.get(family);
  
  const response = await fetch(url, {
    headers: etag ? { "If-None-Match": etag } : {},
  });
  
  if (response.status === 304) {
    // Not modified, use cached version
    return instructionsCache.get(family)!;
  }
  
  const instructions = await response.text();
  
  // Update cache
  instructionsCache.set(family, instructions);
  if (response.headers.has("ETag")) {
    etagCache.set(family, response.headers.get("ETag")!);
  }
  
  return instructions;
}
Benefits:
  • Reduces GitHub API calls (ETag caching)
  • Faster request transformation (in-memory cache)
  • Survives across multiple requests

2. Prewarming

From index.ts:1145:
if (!startupPrewarmTriggered && prewarmEnabled) {
  startupPrewarmTriggered = true;
  const configuredModels = Object.keys(userConfig.models ?? {});
  prewarmCodexInstructions(configuredModels);
  if (codexMode) {
    prewarmHostCodexPrompt();
  }
}
Prewarming triggers:
  • On plugin load (background fetch)
  • Fetches instructions for all configured models
  • No request-time latency for first use

Build docs developers (and LLMs) love