OpenAI-Compatible Proxy

The Manifest proxy provides an OpenAI-compatible endpoint that automatically routes requests to the optimal model based on complexity scoring. Point your LLM client to Manifest instead of the provider’s API.

Authentication

Use Bearer token authentication with your agent API key (format: mnfst_*).

curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "Content-Type: application/json" \
  -d @request.json

OpenAI Compatibility

The proxy accepts standard OpenAI Chat Completions API requests and returns OpenAI-format responses. Compatible with:

OpenAI SDKs (Python, Node.js, Go, etc.)
LangChain, LlamaIndex
OpenClaw, Cursor, Windsurf, Continue
Any tool supporting OpenAI-compatible APIs

Request Format

messages

array

required

Array of conversation messages

Show Message Object

role

string

required

system, user, assistant, or developer

content

string | array

Message content (text string or multi-modal content array)

stream

boolean

default:"false"

Enable streaming responses (Server-Sent Events)

max_tokens

number

Maximum tokens in response (influences tier scoring)

temperature

number

Sampling temperature (forwarded to provider)

top_p

number

Nucleus sampling (forwarded to provider)

tools

array

Function calling tools (OpenAI format)

tool_choice

string | object

Tool choice strategy: auto, none, or specific tool

response_format

object

Response format (e.g., JSON mode)

X-Session-Key

string

Session identifier for momentum tracking (optional, defaults to "default")

traceparent

string

W3C trace context for distributed tracing (optional)

Response Format

Standard OpenAI Chat Completions response with additional Manifest headers.

Response Headers

X-Manifest-Tier

string

Assigned tier: simple, standard, complex, or reasoning

X-Manifest-Model

string

Selected model name

X-Manifest-Provider

string

Provider name

X-Manifest-Confidence

string

Scoring confidence (0-1)

X-Manifest-Reason

string

Scoring reason (see resolve endpoint for values)

Response Body

string

Completion ID (from provider)

object

string

Always "chat.completion" (or "chat.completion.chunk" for streaming)

created

number

Unix timestamp

model

string

Model name used

choices

array

Array of completion choices

Show Choice Object

index

number

Choice index (usually 0)

message

object

Show Message Object

role

string

Always "assistant"

content

string | null

Response content (null if tool calls)

tool_calls

array

Tool calls requested by model

finish_reason

string

stop, length, tool_calls, or content_filter

usage

object

Show Usage Object

prompt_tokens

number

Input tokens consumed

completion_tokens

number

Output tokens generated

total_tokens

number

Total tokens (prompt + completion)

Examples

curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain quantum entanglement"}
    ]
  }'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon where two or more particles become correlated..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 120,
    "total_tokens": 135
  }
}

SDK Integration

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.manifest.build/v1",
    api_key="mnfst_xxx"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",  # Ignored - Manifest selects model
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
print(f"Tier: {response._raw_response.headers['X-Manifest-Tier']}")
print(f"Model: {response._raw_response.headers['X-Manifest-Model']}")

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.manifest.build/v1',
  apiKey: 'mnfst_xxx'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',  // Ignored - Manifest selects model
  messages: [{role: 'user', content: 'Hello!'}]
});

console.log(response.choices[0].message.content);

OpenClaw Configuration

# Set Manifest as the proxy endpoint
openclaw config set plugins.entries.manifest.config.mode prod
openclaw config set plugins.entries.manifest.config.endpoint https://api.manifest.build/otlp
openclaw config set plugins.entries.manifest.config.apiKey mnfst_xxx

# Restart gateway
openclaw gateway restart

Provider Format Translation

Manifest automatically translates between OpenAI format and provider-native formats:

Google Gemini

Translates OpenAI messages → Gemini contents format
Converts OpenAI streaming chunks → SSE format
Maps system role → Gemini system instructions

Anthropic Claude

Translates OpenAI format → Anthropic Messages API
Extracts system messages into Anthropic’s system parameter
Converts streaming SSE events to OpenAI format
Tracks usage across message delta events

OpenRouter

Injects cache_control for Anthropic models
Passes OpenAI format directly for OpenAI models

Other Providers

DeepSeek, Mistral, xAI, MiniMax, Z.AI, and Ollama all use OpenAI-compatible APIs and pass through directly.

Rate Limiting

Per-user concurrent request limit: 10 requests
429 responses: Recorded once per minute per agent (prevents log spam)
Limit exceeded: Returns 429 with message describing threshold

Notification rules can alert on rate limit events.

Error Handling

{
  "error": {
    "message": "No model available. Connect a provider in the Manifest dashboard.",
    "type": "proxy_error"
  }
}

Provider errors (4xx/5xx) are passed through with original status codes and headers.

Observability

OTLP Integration

Manifest automatically records all proxy requests as agent messages with:

Request/response tokens and costs
Model, tier, and provider metadata
Error messages and rate limit events
Trace IDs from traceparent header

View all data in the Manifest dashboard.

Session Momentum

Use X-Session-Key header to group related requests:

curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "X-Session-Key: user-123-conversation-456" \
  ...

Manifest tracks recent tier assignments per session (in-memory, 10k session limit) and applies momentum to prevent tier oscillation during multi-turn tasks.

Distributed Tracing

Pass W3C traceparent header to correlate proxy requests with your application traces:

curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
  ...

The trace ID is extracted and stored with the agent message record.

Performance Notes

Timeout: 180 seconds (3 minutes)
Scoring optimization: System/developer messages filtered before scoring
Heartbeat detection: OpenClaw heartbeats (HEARTBEAT_OK) bypass scoring → simple tier
Concurrent requests: Up to 10 per user
Provider failover: Not implemented (single provider per request)

Overview

Analytics

Agents

Routing

OTLP

Notifications

OpenAI-Compatible Proxy

Authentication

OpenAI Compatibility

Request Format

Response Format

Response Headers

Response Body

Examples

SDK Integration

Python (OpenAI SDK)

Node.js (OpenAI SDK)

OpenClaw Configuration

Provider Format Translation

Google Gemini

Anthropic Claude

OpenRouter

Other Providers

Rate Limiting

Error Handling

Observability

OTLP Integration

Session Momentum

Distributed Tracing

Performance Notes

Build docs developers (and LLMs) love

Overview

Analytics

Agents

Routing

OTLP

Notifications

​Authentication

​OpenAI Compatibility

​Request Format

​Response Format

​Response Headers

​Response Body

​Examples

​SDK Integration

​Python (OpenAI SDK)

​Node.js (OpenAI SDK)

​OpenClaw Configuration

​Provider Format Translation

​Google Gemini

​Anthropic Claude

​OpenRouter

​Other Providers

​Rate Limiting

​Error Handling

​Observability

​OTLP Integration

​Session Momentum

​Distributed Tracing

​Performance Notes

Build docs developers (and LLMs) love

Authentication

OpenAI Compatibility

Request Format

Response Format

Response Headers

Response Body

Examples

SDK Integration

Python (OpenAI SDK)

Node.js (OpenAI SDK)

OpenClaw Configuration

Provider Format Translation

Google Gemini

Anthropic Claude

OpenRouter

Other Providers

Rate Limiting

Error Handling

Observability

OTLP Integration

Session Momentum

Distributed Tracing

Performance Notes