The Manifest proxy provides an OpenAI-compatible endpoint that automatically routes requests to the optimal model based on complexity scoring. Point your LLM client to Manifest instead of the provider’s API.
Authentication
Use Bearer token authentication with your agent API key (format: mnfst_*).
curl -X POST https://api.manifest.build/v1/chat/completions \
-H "Authorization: Bearer mnfst_xxx" \
-H "Content-Type: application/json" \
-d @request.json
OpenAI Compatibility
The proxy accepts standard OpenAI Chat Completions API requests and returns OpenAI-format responses. Compatible with:
OpenAI SDKs (Python, Node.js, Go, etc.)
LangChain, LlamaIndex
OpenClaw, Cursor, Windsurf, Continue
Any tool supporting OpenAI-compatible APIs
Array of conversation messages system, user, assistant, or developer
Message content (text string or multi-modal content array)
Enable streaming responses (Server-Sent Events)
Maximum tokens in response (influences tier scoring)
Sampling temperature (forwarded to provider)
Nucleus sampling (forwarded to provider)
Function calling tools (OpenAI format)
Tool choice strategy: auto, none, or specific tool
Response format (e.g., JSON mode)
Session identifier for momentum tracking (optional, defaults to "default")
W3C trace context for distributed tracing (optional)
Standard OpenAI Chat Completions response with additional Manifest headers.
Assigned tier: simple, standard, complex, or reasoning
Scoring reason (see resolve endpoint for values)
Response Body
Completion ID (from provider)
Always "chat.completion" (or "chat.completion.chunk" for streaming)
Array of completion choices Response content (null if tool calls)
Tool calls requested by model
stop, length, tool_calls, or content_filter
Total tokens (prompt + completion)
Examples
Basic Request
Streaming
With Tools
Session Tracking
curl -X POST https://api.manifest.build/v1/chat/completions \
-H "Authorization: Bearer mnfst_xxx" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Explain quantum entanglement"}
]
}'
Non-Streaming Response
Streaming Response (SSE)
{
"id" : "chatcmpl-abc123" ,
"object" : "chat.completion" ,
"created" : 1735689600 ,
"model" : "gpt-4o-mini" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Quantum entanglement is a phenomenon where two or more particles become correlated..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 15 ,
"completion_tokens" : 120 ,
"total_tokens" : 135
}
}
SDK Integration
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url = "https://api.manifest.build/v1" ,
api_key = "mnfst_xxx"
)
response = client.chat.completions.create(
model = "gpt-4o-mini" , # Ignored - Manifest selects model
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
print ( f "Tier: { response._raw_response.headers[ 'X-Manifest-Tier' ] } " )
print ( f "Model: { response._raw_response.headers[ 'X-Manifest-Model' ] } " )
Node.js (OpenAI SDK)
import OpenAI from 'openai' ;
const client = new OpenAI ({
baseURL: 'https://api.manifest.build/v1' ,
apiKey: 'mnfst_xxx'
});
const response = await client . chat . completions . create ({
model: 'gpt-4o-mini' , // Ignored - Manifest selects model
messages: [{ role: 'user' , content: 'Hello!' }]
});
console . log ( response . choices [ 0 ]. message . content );
OpenClaw Configuration
# Set Manifest as the proxy endpoint
openclaw config set plugins.entries.manifest.config.mode prod
openclaw config set plugins.entries.manifest.config.endpoint https://api.manifest.build/otlp
openclaw config set plugins.entries.manifest.config.apiKey mnfst_xxx
# Restart gateway
openclaw gateway restart
Manifest automatically translates between OpenAI format and provider-native formats:
Google Gemini
Translates OpenAI messages → Gemini contents format
Converts OpenAI streaming chunks → SSE format
Maps system role → Gemini system instructions
Anthropic Claude
Translates OpenAI format → Anthropic Messages API
Extracts system messages into Anthropic’s system parameter
Converts streaming SSE events to OpenAI format
Tracks usage across message delta events
OpenRouter
Injects cache_control for Anthropic models
Passes OpenAI format directly for OpenAI models
Other Providers
DeepSeek, Mistral, xAI, MiniMax, Z.AI, and Ollama all use OpenAI-compatible APIs and pass through directly.
Rate Limiting
Per-user concurrent request limit : 10 requests
429 responses : Recorded once per minute per agent (prevents log spam)
Limit exceeded : Returns 429 with message describing threshold
Notification rules can alert on rate limit events.
Error Handling
No Provider Configured
Missing API Key
Rate Limit Exceeded
Provider Error (Passthrough)
{
"error" : {
"message" : "No model available. Connect a provider in the Manifest dashboard." ,
"type" : "proxy_error"
}
}
Provider errors (4xx/5xx) are passed through with original status codes and headers.
Observability
OTLP Integration
Manifest automatically records all proxy requests as agent messages with:
Request/response tokens and costs
Model, tier, and provider metadata
Error messages and rate limit events
Trace IDs from traceparent header
View all data in the Manifest dashboard.
Session Momentum
Use X-Session-Key header to group related requests:
curl -X POST https://api.manifest.build/v1/chat/completions \
-H "Authorization: Bearer mnfst_xxx" \
-H "X-Session-Key: user-123-conversation-456" \
...
Manifest tracks recent tier assignments per session (in-memory, 10k session limit) and applies momentum to prevent tier oscillation during multi-turn tasks.
Distributed Tracing
Pass W3C traceparent header to correlate proxy requests with your application traces:
curl -X POST https://api.manifest.build/v1/chat/completions \
-H "Authorization: Bearer mnfst_xxx" \
-H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
...
The trace ID is extracted and stored with the agent message record.
Timeout : 180 seconds (3 minutes)
Scoring optimization : System/developer messages filtered before scoring
Heartbeat detection : OpenClaw heartbeats (HEARTBEAT_OK) bypass scoring → simple tier
Concurrent requests : Up to 10 per user
Provider failover : Not implemented (single provider per request)