Skip to main content
POST
/
v1
/
chat
/
completions
curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain quantum entanglement"}
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon where two or more particles become correlated..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 120,
    "total_tokens": 135
  }
}
The Manifest proxy provides an OpenAI-compatible endpoint that automatically routes requests to the optimal model based on complexity scoring. Point your LLM client to Manifest instead of the provider’s API.

Authentication

Use Bearer token authentication with your agent API key (format: mnfst_*).
curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "Content-Type: application/json" \
  -d @request.json

OpenAI Compatibility

The proxy accepts standard OpenAI Chat Completions API requests and returns OpenAI-format responses. Compatible with:
  • OpenAI SDKs (Python, Node.js, Go, etc.)
  • LangChain, LlamaIndex
  • OpenClaw, Cursor, Windsurf, Continue
  • Any tool supporting OpenAI-compatible APIs

Request Format

messages
array
required
Array of conversation messages
stream
boolean
default:"false"
Enable streaming responses (Server-Sent Events)
max_tokens
number
Maximum tokens in response (influences tier scoring)
temperature
number
Sampling temperature (forwarded to provider)
top_p
number
Nucleus sampling (forwarded to provider)
tools
array
Function calling tools (OpenAI format)
tool_choice
string | object
Tool choice strategy: auto, none, or specific tool
response_format
object
Response format (e.g., JSON mode)
X-Session-Key
string
Session identifier for momentum tracking (optional, defaults to "default")
traceparent
string
W3C trace context for distributed tracing (optional)

Response Format

Standard OpenAI Chat Completions response with additional Manifest headers.

Response Headers

X-Manifest-Tier
string
Assigned tier: simple, standard, complex, or reasoning
X-Manifest-Model
string
Selected model name
X-Manifest-Provider
string
Provider name
X-Manifest-Confidence
string
Scoring confidence (0-1)
X-Manifest-Reason
string
Scoring reason (see resolve endpoint for values)

Response Body

id
string
Completion ID (from provider)
object
string
Always "chat.completion" (or "chat.completion.chunk" for streaming)
created
number
Unix timestamp
model
string
Model name used
choices
array
Array of completion choices
usage
object

Examples

curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain quantum entanglement"}
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon where two or more particles become correlated..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 120,
    "total_tokens": 135
  }
}

SDK Integration

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.manifest.build/v1",
    api_key="mnfst_xxx"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",  # Ignored - Manifest selects model
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
print(f"Tier: {response._raw_response.headers['X-Manifest-Tier']}")
print(f"Model: {response._raw_response.headers['X-Manifest-Model']}")

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.manifest.build/v1',
  apiKey: 'mnfst_xxx'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',  // Ignored - Manifest selects model
  messages: [{role: 'user', content: 'Hello!'}]
});

console.log(response.choices[0].message.content);

OpenClaw Configuration

# Set Manifest as the proxy endpoint
openclaw config set plugins.entries.manifest.config.mode prod
openclaw config set plugins.entries.manifest.config.endpoint https://api.manifest.build/otlp
openclaw config set plugins.entries.manifest.config.apiKey mnfst_xxx

# Restart gateway
openclaw gateway restart

Provider Format Translation

Manifest automatically translates between OpenAI format and provider-native formats:

Google Gemini

  • Translates OpenAI messages → Gemini contents format
  • Converts OpenAI streaming chunks → SSE format
  • Maps system role → Gemini system instructions

Anthropic Claude

  • Translates OpenAI format → Anthropic Messages API
  • Extracts system messages into Anthropic’s system parameter
  • Converts streaming SSE events to OpenAI format
  • Tracks usage across message delta events

OpenRouter

  • Injects cache_control for Anthropic models
  • Passes OpenAI format directly for OpenAI models

Other Providers

DeepSeek, Mistral, xAI, MiniMax, Z.AI, and Ollama all use OpenAI-compatible APIs and pass through directly.

Rate Limiting

  • Per-user concurrent request limit: 10 requests
  • 429 responses: Recorded once per minute per agent (prevents log spam)
  • Limit exceeded: Returns 429 with message describing threshold
Notification rules can alert on rate limit events.

Error Handling

{
  "error": {
    "message": "No model available. Connect a provider in the Manifest dashboard.",
    "type": "proxy_error"
  }
}
Provider errors (4xx/5xx) are passed through with original status codes and headers.

Observability

OTLP Integration

Manifest automatically records all proxy requests as agent messages with:
  • Request/response tokens and costs
  • Model, tier, and provider metadata
  • Error messages and rate limit events
  • Trace IDs from traceparent header
View all data in the Manifest dashboard.

Session Momentum

Use X-Session-Key header to group related requests:
curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "X-Session-Key: user-123-conversation-456" \
  ...
Manifest tracks recent tier assignments per session (in-memory, 10k session limit) and applies momentum to prevent tier oscillation during multi-turn tasks.

Distributed Tracing

Pass W3C traceparent header to correlate proxy requests with your application traces:
curl -X POST https://api.manifest.build/v1/chat/completions \
  -H "Authorization: Bearer mnfst_xxx" \
  -H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
  ...
The trace ID is extracted and stored with the agent message record.

Performance Notes

  • Timeout: 180 seconds (3 minutes)
  • Scoring optimization: System/developer messages filtered before scoring
  • Heartbeat detection: OpenClaw heartbeats (HEARTBEAT_OK) bypass scoring → simple tier
  • Concurrent requests: Up to 10 per user
  • Provider failover: Not implemented (single provider per request)

Build docs developers (and LLMs) love