Request Transformation Pipeline

Codex Multi-Auth transforms every OpenAI SDK request through a 7-step pipeline before sending to the Codex API.

Pipeline Overview

OpenAI SDK Call (generateText/streamText)
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 1: URL Rewriting                                      │
│   /v1/responses → /v1/realtime/responses                   │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 2: Account Selection                                  │
│   - Filter cooldowns & rate limits                         │
│   - Apply session affinity                                 │
│   - Score by health + quota                                │
│   - Select best account                                    │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 3: Model Normalization                                │
│   gpt-5.3-codex → gpt-5-codex                              │
│   openai/gpt-5-codex → gpt-5-codex                         │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 4: Body Transformation                                │
│   - Inject model-family instructions                       │
│   - Set store: false, stream: true                         │
│   - Configure reasoning & text verbosity                   │
│   - Add reasoning.encrypted_content to include             │
│   - Filter orphaned tool outputs                           │
│   - Apply fast-session optimizations                       │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 5: Header Injection                                   │
│   - Authorization: Bearer <access_token>                   │
│   - openai-account-id: <account_id>                        │
│   - openai-beta: realtime-responses-2024-11-19             │
│   - openai-originator: codex_cli_rs                        │
│   - openai-conversation-id: <prompt_cache_key>             │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 6: Execute Request                                    │
│   - Set fetch timeout (default 2 minutes)                  │
│   - Enable stream stall detection (default 45s)            │
│   - Apply circuit breaker                                  │
└────────────────────────────────────────────────────────────┘
  |
  v
┌────────────────────────────────────────────────────────────┐
│ Step 7: Response Handling                                  │
│   - SSE → JSON for generateText (non-streaming)            │
│   - Pass-through SSE for streamText                        │
│   - Extract rate limit info from headers                   │
│   - Update account state (cooldowns, rate limits)          │
└────────────────────────────────────────────────────────────┘
  |
  v
Response to SDK

Step 1: URL Rewriting

From lib/request/fetch-helpers.ts:381:

export function rewriteUrlForCodex(url: string): string {
  const parsedUrl = new URL(url);
  
  // Rewrite /v1/responses to /v1/realtime/responses
  const rewrittenPath = parsedUrl.pathname.includes("/v1/responses")
    ? parsedUrl.pathname.replace("/v1/responses", "/v1/realtime/responses")
    : parsedUrl.pathname;
  
  // Ensure base path prefix
  const normalizedPath = rewrittenPath.startsWith("/v1/realtime/")
    ? rewrittenPath
    : `/v1/realtime${rewrittenPath}`;
  
  // Update to Codex base URL
  parsedUrl.protocol = "https:";
  parsedUrl.host = "api.openai.com";
  parsedUrl.pathname = normalizedPath;
  
  return parsedUrl.toString();
}

Example:

Input:  https://api.openai.com/v1/responses
Output: https://api.openai.com/v1/realtime/responses

Step 2: Account Selection

From lib/accounts.ts and index.ts:1372:

// 1. Filter accounts
const now = Date.now();
const available = accounts.filter((account, index) => {
  // Skip if in cooldown
  if (account.cooldownUntil && account.cooldownUntil > now) {
    return false;
  }
  
  // Skip if rate limited for this model family
  const resetTime = getRateLimitResetTimeForFamily(account, now, modelFamily);
  if (resetTime && resetTime > now) {
    return false;
  }
  
  // Skip if circuit breaker is open
  const breaker = getCircuitBreaker(`account:${index}`);
  if (breaker.getState() === "open") {
    return false;
  }
  
  return true;
});

// 2. Apply session affinity
const preferredIndex = sessionAffinityStore.getPreferredAccountIndex(threadId);
if (preferredIndex !== undefined && available.includes(accounts[preferredIndex])) {
  // Prefer same account for this conversation
  selectedIndex = preferredIndex;
} else {
  // 3. Score by health + quota + capability + preemptive quota
  const scores = available.map((account, index) => {
    let score = account.healthScore ?? 100;
    
    // Boost for capability policy
    score += capabilityPolicyStore.getAccountScore(index, model) * 0.1;
    
    // Reduce for preemptive quota deferral
    const deferralMs = preemptiveQuotaScheduler.shouldDeferRequest(index, modelFamily);
    if (deferralMs > 0) {
      score -= 50;
    }
    
    // PID offset for fair rotation
    if (pidOffsetEnabled) {
      score += (index * 0.001);
    }
    
    return { index, score };
  });
  
  // 4. Select highest score
  scores.sort((a, b) => b.score - a.score);
  selectedIndex = scores[0]?.index ?? 0;
}

Selection Factors:

Cooldown status: Skip accounts with active cooldown
Rate limit status: Skip accounts with rate limits for this model family
Circuit breaker: Skip accounts with open circuit breaker
Session affinity: Prefer same account for same conversation
Health score: 0-100, decrements on failure, resets on success
Capability score: Boost for accounts that support this model
Quota deferral: Reduce score if quota is low
PID offset: Tiny offset for deterministic fair rotation

Step 3: Model Normalization

From lib/request/request-transformer.ts:40:

export function normalizeModel(model: string | undefined): string {
  if (!model) return "gpt-5.1";
  
  // Strip provider prefix (openai/gpt-5-codex → gpt-5-codex)
  const modelId = model.includes("/") ? model.split("/").pop() ?? model : model;
  
  // Explicit model map (handles known variants)
  const mappedModel = getNormalizedModel(modelId);
  if (mappedModel) return mappedModel;
  
  // Pattern-based fallback
  const normalized = modelId.toLowerCase();
  
  // Legacy aliases
  if (normalized.includes("gpt-5.3-codex-spark")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.3-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.2-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.1-codex")) return "gpt-5-codex";
  
  // Canonical Codex models
  if (normalized.includes("gpt-5-codex")) return "gpt-5-codex";
  if (normalized.includes("gpt-5.1-codex-max")) return "gpt-5.1-codex-max";
  if (normalized.includes("gpt-5.1-codex-mini")) return "gpt-5.1-codex-mini";
  
  // GPT-5 variants
  if (normalized.includes("gpt-5.2")) return "gpt-5.2";
  if (normalized.includes("gpt-5.1")) return "gpt-5.1";
  if (normalized.includes("gpt-5")) return "gpt-5.1";
  
  return "gpt-5.1"; // Default fallback
}

Normalization Examples:

openai/gpt-5-codex          → gpt-5-codex
gpt-5.3-codex-spark         → gpt-5-codex
gpt-5.2-codex               → gpt-5-codex
gpt-5.1-codex               → gpt-5-codex
gpt-5-codex-low             → gpt-5-codex (variant stripped for API)
gpt-5.1-codex-max           → gpt-5.1-codex-max
gpt-5.1-codex-mini          → gpt-5.1-codex-mini
gpt-5.2                     → gpt-5.2
gpt-5.1                     → gpt-5.1

Step 4: Body Transformation

From lib/request/request-transformer.ts:821:

export async function transformRequestBody(
  body: RequestBody,
  codexInstructions: string,
  userConfig: UserConfig = { global: {}, models: {} },
  codexMode = true,
  fastSession = false,
  fastSessionStrategy: FastSessionStrategy = "hybrid",
  fastSessionMaxInputItems = 30,
): Promise<RequestBody> {
  const originalModel = body.model;
  const normalizedModel = normalizeModel(body.model);
  const modelConfig = getModelConfig(originalModel || normalizedModel, userConfig);
  
  // Set normalized model
  body.model = normalizedModel;
  
  // Codex required fields
  body.store = false;        // Stateless (required by ChatGPT backend)
  body.stream = true;        // Always stream (SSE)
  
  // Inject Codex instructions
  body.instructions = shouldApplyFastSessionTuning
    ? compactInstructionsForFastSession(codexInstructions, isTrivialTurn)
    : codexInstructions;
  
  // Filter input array
  if (body.input) {
    // Apply fast-session input trimming
    if (fastSession) {
      body.input = trimInputForFastSession(
        body.input,
        fastSessionMaxInputItems,
        { preferLatestUserOnly: isTrivialTurn }
      );
    }
    
    // Remove item_reference (AI SDK construct, not supported by Codex)
    // Strip IDs from all items (stateless mode)
    body.input = filterInput(body.input);
    
    // Add bridge/tool-remap message
    if (codexMode) {
      body.input = await filterHostSystemPrompts(body.input);
      body.input = addCodexBridgeMessage(body.input, !!body.tools);
    } else {
      body.input = addToolRemapMessage(body.input, !!body.tools);
    }
    
    // Handle orphaned tool outputs
    body.input = normalizeOrphanedToolOutputs(body.input);
    body.input = injectMissingToolOutputs(body.input);
  }
  
  // Configure reasoning
  const reasoningConfig = resolveReasoningConfig(normalizedModel, modelConfig, body);
  body.reasoning = {
    ...body.reasoning,
    ...reasoningConfig,
  };
  
  // Fast-session overrides
  if (fastSession && shouldApplyFastSessionTuning) {
    body.reasoning.effort = "none"; // or "low" for Codex models
    body.reasoning.summary = "auto";
    body.text.verbosity = "low";
  }
  
  // Configure text verbosity
  body.text = {
    ...body.text,
    verbosity: resolveTextVerbosity(modelConfig, body),
  };
  
  // Add include for encrypted reasoning content
  body.include = resolveInclude(modelConfig, body);
  // Always includes "reasoning.encrypted_content" for stateless continuity
  
  // Remove unsupported parameters
  body.max_output_tokens = undefined;
  body.max_completion_tokens = undefined;
  
  return body;
}

Key Transformations

1. Stateless Mode (`store: false`)

From ARCHITECTURE.md:73:

ChatGPT backend requires store: false, include reasoning.encrypted_content.

Why stateless?

Codex API doesn’t persist conversation state
Requires full context in each request
reasoning.encrypted_content maintains reasoning continuity

2. Input Filtering

From lib/request/request-transformer.ts:542:

export function filterInput(
  input: InputItem[] | undefined,
): InputItem[] | undefined {
  if (!Array.isArray(input)) return input;
  
  return input
    .filter((item) => {
      // Remove AI SDK constructs not supported by Codex API
      if (item.type === "item_reference") {
        return false; // AI SDK only - references server state
      }
      return true; // Keep all other items
    })
    .map((item) => {
      // Strip IDs from all items (Codex API stateless mode)
      if (item.id) {
        const { id: _omit, ...itemWithoutId } = item;
        return itemWithoutId as InputItem;
      }
      return item;
    });
}

Why remove item_reference?

AI SDK uses this for server-side state lookup
Not supported by Codex API (stateless)
Would cause API errors

Why strip IDs?

Stateless mode doesn’t track item IDs
Reduces payload size
Prevents ID conflicts

3. Orphaned Tool Outputs

From lib/request/helpers/input-utils.ts:180:

export function normalizeOrphanedToolOutputs(
  input: InputItem[],
): InputItem[] {
  // Problem: function_call_output references a function_call that was
  // an item_reference (now filtered out). API rejects orphaned outputs.
  
  // Solution: Convert orphaned outputs to assistant messages to preserve
  // context without API errors.
  
  const functionCallIds = new Set<string>();
  for (const item of input) {
    if (item.type === "function_call" && item.call_id) {
      functionCallIds.add(item.call_id);
    }
  }
  
  return input.map((item) => {
    if (item.type === "function_call_output") {
      const callId = item.call_id;
      if (callId && !functionCallIds.has(callId)) {
        // Orphaned output - convert to message
        return {
          type: "message",
          role: "assistant",
          content: [
            {
              type: "input_text",
              text: `[Previous tool result: ${JSON.stringify(item.output)}]`,
            },
          ],
        } as InputItem;
      }
    }
    return item;
  });
}

Why this matters:

Prevents infinite loops (LLM loses tool results)
Preserves conversation context
Avoids API validation errors

4. Reasoning Configuration

From lib/request/request-transformer.ts:388:

export function getReasoningConfig(
  modelName: string | undefined,
  userConfig: ConfigOptions = {},
): ReasoningConfig {
  const normalizedName = modelName?.toLowerCase() ?? "";
  
  // Canonical GPT-5 Codex defaults to high, does not support "none"
  const isGpt5Codex = normalizedName.includes("gpt-5-codex");
  const isGpt52General = normalizedName.includes("gpt-5.2") && !isCodex;
  const isGpt51General = normalizedName.includes("gpt-5.1") && !isCodex;
  
  // GPT-5.2 general and GPT-5.1 general support "none" reasoning
  const supportsNone = isGpt52General || isGpt51General;
  
  // GPT-5.2 general supports xhigh
  const supportsXhigh = isGpt52General;
  
  // Default effort
  const defaultEffort = isGpt5Codex ? "high" : "medium";
  
  // Get user-requested effort
  let effort = userConfig.reasoningEffort || defaultEffort;
  
  // Downgrade unsupported values
  if (!supportsXhigh && effort === "xhigh") {
    effort = "high";
  }
  if (!supportsNone && effort === "none") {
    effort = "low";
  }
  
  const summary = userConfig.reasoningSummary ?? "auto";
  
  return { effort, summary };
}

Model-Specific Defaults:

gpt-5-codex:       effort: high,    supports: low, medium, high
gpt-5.1-codex-max: effort: high,    supports: low, medium, high, xhigh
gpt-5.1-codex-mini: effort: medium, supports: medium, high
gpt-5.2:           effort: high,    supports: none, low, medium, high, xhigh
gpt-5.1:           effort: medium,  supports: none, low, medium, high

5. Fast-Session Optimizations

From lib/request/request-transformer.ts:569:

export function trimInputForFastSession(
  input: InputItem[] | undefined,
  maxItems: number,
  options?: { preferLatestUserOnly?: boolean },
): InputItem[] | undefined {
  if (!Array.isArray(input)) return input;
  
  const safeMax = Math.max(8, Math.floor(maxItems)); // Default 30
  
  // Strategy 1: Trivial turns (short, simple questions)
  if (options?.preferLatestUserOnly && isTrivialLatestPrompt(latestUserText)) {
    // Keep only: minimal system prompt + latest user message
    return [firstSystemPrompt, latestUserMessage];
  }
  
  // Strategy 2: Complex requests (code blocks, lists, tables)
  // Keep: up to 2 leading system/developer messages + last N items
  const keepIndexes = new Set<number>();
  
  // Keep small leading system prompts (< 1200 chars)
  for (let i = 0; i < 2; i++) {
    const item = input[i];
    if (item?.role === "developer" || item?.role === "system") {
      const text = extractMessageText(item.content);
      if (text.length <= 1200) {
        keepIndexes.add(i);
      }
    }
  }
  
  // Keep last N items
  for (let i = Math.max(0, input.length - safeMax); i < input.length; i++) {
    keepIndexes.add(i);
  }
  
  return input.filter((_, index) => keepIndexes.has(index));
}

Fast-Session Benefits:

Lower latency: Smaller context = faster processing
Lower cost: Less tokens to process
Better UX: Instant responses for simple questions

When applied:

const shouldApplyFastSessionTuning = 
  fastSession &&
  (fastSessionStrategy === "always" ||
   !isComplexFastSessionRequest(body, maxInputItems));

Step 5: Header Injection

From lib/request/fetch-helpers.ts:505:

export function createCodexHeaders(
  init: RequestInit | undefined,
  accountId: string,
  accessToken: string,
  opts?: { model?: string; promptCacheKey?: string },
): Headers {
  const headers = new Headers(init?.headers ?? {});
  
  // Remove any existing API key
  headers.delete("x-api-key");
  
  // OAuth authentication
  headers.set("Authorization", `Bearer ${accessToken}`);
  
  // Account ID (org-* or user-*)
  headers.set("openai-account-id", accountId);
  
  // Realtime responses beta flag
  headers.set("openai-beta", "realtime-responses-2024-11-19");
  
  // Originator tag (identifies Codex CLI)
  headers.set("openai-originator", "codex_cli_rs");
  
  // Prompt caching (session affinity)
  const cacheKey = opts?.promptCacheKey;
  if (cacheKey) {
    headers.set("openai-conversation-id", cacheKey);
    headers.set("openai-session-id", cacheKey);
  } else {
    headers.delete("openai-conversation-id");
    headers.delete("openai-session-id");
  }
  
  // Accept SSE
  headers.set("accept", "text/event-stream");
  
  return headers;
}

Key Headers:

Authorization: OAuth bearer token
openai-account-id: Account/org ID (affects quotas)
openai-beta: Enable realtime responses API
openai-originator: Identifies plugin as Codex CLI
openai-conversation-id: Prompt cache key (optional)
openai-session-id: Session identifier (optional)

Step 6: Execute Request

From index.ts:1550:

const controller = new AbortController();
const fetchTimeoutMs = getFetchTimeoutMs(pluginConfig); // Default 120s

const timeoutId = setTimeout(() => {
  controller.abort();
}, fetchTimeoutMs);

try {
  // Execute with circuit breaker
  const breaker = getCircuitBreaker(`account:${selectedIndex}`);
  if (!breaker.canExecute()) {
    throw new CircuitOpenError();
  }
  
  const response = await fetch(url, {
    ...requestInit,
    headers: codexHeaders,
    signal: controller.signal,
  });
  
  clearTimeout(timeoutId);
  
  // Record circuit breaker success
  breaker.recordSuccess();
  
  return response;
} catch (error) {
  clearTimeout(timeoutId);
  
  // Record circuit breaker failure
  breaker.recordFailure();
  
  throw error;
}

Timeout Behavior:

Default: 120 seconds (2 minutes)
Configurable via CODEX_AUTH_FETCH_TIMEOUT_MS
Aborts request on timeout
Triggers failover to next account

Step 7: Response Handling

From lib/request/fetch-helpers.ts:589:

export async function handleSuccessResponse(
  response: Response,
  isStreaming: boolean,
  options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
  // Check for deprecation headers (RFC 8594)
  const deprecation = response.headers.get("Deprecation");
  const sunset = response.headers.get("Sunset");
  if (deprecation || sunset) {
    logWarn(`API deprecation notice`, { deprecation, sunset });
  }
  
  const responseHeaders = ensureContentType(response.headers);
  
  // For non-streaming requests (generateText), convert SSE to JSON
  if (!isStreaming) {
    return await convertSseToJson(response, responseHeaders, options);
  }
  
  // For streaming requests (streamText), return stream as-is
  return new Response(response.body, {
    status: response.status,
    statusText: response.statusText,
    headers: responseHeaders,
  });
}

SSE to JSON Conversion

From lib/request/response-handler.ts:47:

export async function convertSseToJson(
  response: Response,
  headers: Headers,
  options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
  if (!response.body) {
    return new Response("{}", { status: 200, headers });
  }
  
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
  let finalData: unknown = null;
  
  try {
    while (true) {
      const { done, value } = await readWithTimeout(
        reader,
        options?.streamStallTimeoutMs ?? 45_000,
      );
      
      if (done) break;
      
      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split("\n");
      buffer = lines.pop() ?? "";
      
      for (const line of lines) {
        if (line.startsWith("data: ")) {
          const data = line.slice(6);
          if (data === "[DONE]") continue;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.type === "response.done") {
              finalData = parsed.response;
              break;
            }
          } catch {
            continue;
          }
        }
      }
      
      if (finalData) break;
    }
  } finally {
    reader.releaseLock();
  }
  
  const json = finalData ?? {};
  return new Response(JSON.stringify(json), {
    status: 200,
    headers,
  });
}

SSE Event Format:

data: {"type":"response.started","response":{"id":"resp_abc123"}}
data: {"type":"response.output_text.delta","delta":"Hello"}
data: {"type":"response.output_text.delta","delta":" world"}
data: {"type":"response.done","response":{"id":"resp_abc123","output":[{"type":"text","text":"Hello world"}]}}
data: [DONE]

Conversion Result:

{
  "id": "resp_abc123",
  "output": [
    {
      "type": "text",
      "text": "Hello world"
    }
  ]
}

Performance Optimizations

1. Model-Family Instruction Caching

From lib/prompts/codex.ts:180:

const instructionsCache = new Map<ModelFamily, string>();
const etagCache = new Map<ModelFamily, string>();

export async function getCodexInstructions(
  model: string,
): Promise<string> {
  const family = getModelFamily(model);
  
  // Return cached instructions if available
  if (instructionsCache.has(family)) {
    return instructionsCache.get(family)!;
  }
  
  // Fetch from GitHub with ETag caching
  const url = CODEX_INSTRUCTIONS_URLS[family];
  const etag = etagCache.get(family);
  
  const response = await fetch(url, {
    headers: etag ? { "If-None-Match": etag } : {},
  });
  
  if (response.status === 304) {
    // Not modified, use cached version
    return instructionsCache.get(family)!;
  }
  
  const instructions = await response.text();
  
  // Update cache
  instructionsCache.set(family, instructions);
  if (response.headers.has("ETag")) {
    etagCache.set(family, response.headers.get("ETag")!);
  }
  
  return instructions;
}

Benefits:

Reduces GitHub API calls (ETag caching)
Faster request transformation (in-memory cache)
Survives across multiple requests

2. Prewarming

From index.ts:1145:

if (!startupPrewarmTriggered && prewarmEnabled) {
  startupPrewarmTriggered = true;
  const configuredModels = Object.keys(userConfig.models ?? {});
  prewarmCodexInstructions(configuredModels);
  if (codexMode) {
    prewarmHostCodexPrompt();
  }
}

Prewarming triggers:

On plugin load (background fetch)
Fetches instructions for all configured models
No request-time latency for first use

Architecture

Integration

Development

Request Transformation Pipeline

Request Transformation Pipeline

Pipeline Overview

Step 1: URL Rewriting

Step 2: Account Selection

Step 3: Model Normalization

Step 4: Body Transformation

Key Transformations

1. Stateless Mode (`store: false`)

2. Input Filtering

3. Orphaned Tool Outputs

4. Reasoning Configuration

5. Fast-Session Optimizations

Step 5: Header Injection

Step 6: Execute Request

Step 7: Response Handling

SSE to JSON Conversion

Performance Optimizations

1. Model-Family Instruction Caching

2. Prewarming

Build docs developers (and LLMs) love

Architecture

Integration

Development

​Request Transformation Pipeline

​Pipeline Overview

​Step 1: URL Rewriting

​Step 2: Account Selection

​Step 3: Model Normalization

​Step 4: Body Transformation

​Key Transformations

​1. Stateless Mode (store: false)

​2. Input Filtering

​3. Orphaned Tool Outputs

​4. Reasoning Configuration

​5. Fast-Session Optimizations

​Step 5: Header Injection

​Step 6: Execute Request

​Step 7: Response Handling

​SSE to JSON Conversion

​Performance Optimizations

​1. Model-Family Instruction Caching

​2. Prewarming

​Related Documentation

Build docs developers (and LLMs) love

Request Transformation Pipeline

Pipeline Overview

Step 1: URL Rewriting

Step 2: Account Selection

Step 3: Model Normalization

Step 4: Body Transformation

Key Transformations

1. Stateless Mode (`store: false`)

2. Input Filtering

3. Orphaned Tool Outputs

4. Reasoning Configuration

5. Fast-Session Optimizations

Step 5: Header Injection

Step 6: Execute Request

Step 7: Response Handling

SSE to JSON Conversion

Performance Optimizations

1. Model-Family Instruction Caching

2. Prewarming

Related Documentation