OpenAI-Compatible Endpoints
OpenFang implements the OpenAI/v1/chat/completions API, allowing any OpenAI-compatible client library to talk directly to your agents. Your agents become LLM endpoints that any tool, script, or application can use.
All OpenAI-compatible endpoints are available at
http://127.0.0.1:4200/v1/ by default. Configure the port in ~/.openfang/config.toml.How It Works
When you send a request to/v1/chat/completions, OpenFang:
- Resolves the agent — The
modelfield maps to an agent by name, UUID, oropenfang:<name>prefix - Converts messages — OpenAI message format is converted to OpenFang’s internal format
- Executes the agent loop — The agent processes your message with full tool access and multi-turn execution
- Returns formatted response — Response follows OpenAI’s exact structure, including token usage
Authentication
Currently, OpenFang’s OpenAI-compatible endpoints do not require authentication when running locally. For production deployments, configure network access controls or place OpenFang behind a reverse proxy with authentication.Agent Resolution
Themodel field in your request resolves to an agent using the following priority:
openfang:<name>— Explicit agent name with prefix (e.g.,openfang:researcher)- Valid UUID — Direct agent ID lookup (e.g.,
550e8400-e29b-41d4-a716-446655440000) - Plain string — Agent name without prefix (e.g.,
researcher)
404 error with model_not_found code.
POST /v1/chat/completions
Send a message to an agent and receive a completion response. Supports both streaming and non-streaming modes.Request Body
Agent identifier — name, UUID, or
openfang:<name> formatArray of message objects with
role and content fields. Supports user, assistant, and system roles.Enable Server-Sent Events (SSE) streaming for real-time token delivery
Maximum tokens to generate (passed to underlying LLM if supported)
Sampling temperature 0.0-2.0 (passed to underlying LLM if supported)
Response
Unique request identifier in format
chatcmpl-<uuid>Always
"chat.completion" for non-streaming, "chat.completion.chunk" for streamingUnix timestamp of response creation
Agent name that processed the request
Array containing the completion choice(s)
Choice index (always 0 for single completions)
Reason completion stopped:
"stop", "length", "tool_calls"Non-Streaming Example
Streaming Example
Whenstream: true, the response is delivered as Server-Sent Events (SSE). Each chunk contains incremental deltas.
Multimodal (Vision) Support
OpenFang supports image inputs via base64-encoded data URIs. Images are passed to the agent’s underlying LLM if it supports vision.Image Support Notes:
- Only base64-encoded data URIs are supported (format:
data:image/{type};base64,{data}) - URL-based images are not supported (would require external fetching)
- The agent’s configured LLM must support vision (e.g., Claude 3, GPT-4V, Gemini Pro Vision)
GET /v1/models
List all available agents as OpenAI-compatible model objects. Each agent appears as a model with theopenfang:<name> identifier.
Response
Always
"list"Example
Error Responses
All errors follow OpenAI’s error format with a structured error object.Model Not Found (404)
Missing User Message (400)
Agent Processing Failed (500)
Integration Examples
Use with LangChain
Use with Cursor AI
Configure Cursor to use OpenFang as an LLM provider:- Open Cursor Settings → Models
- Add custom OpenAI-compatible endpoint:
http://127.0.0.1:4200/v1 - Set model name to
openfang:<your-agent-name> - Leave API key empty or set to
not-needed
Use with Continue.dev
Add to your~/.continue/config.json:
Use with Anything That Supports OpenAI
Any tool, library, or platform that supports custom OpenAI endpoints can use OpenFang:- LM Studio — Add custom endpoint in settings
- Jan.ai — Configure remote server with OpenFang URL
- Open WebUI — Add as OpenAI-compatible connection
- LibreChat — Add as custom endpoint
- BetterChatGPT — Configure API endpoint
- Chatbox — Add custom OpenAI endpoint
- Base URL:
http://127.0.0.1:4200/v1 - API Key:
not-needed(or leave empty) - Model:
openfang:<agent-name>or just agent name
Tool Calls and Multi-Turn Execution
When an agent uses tools during execution, the OpenAI-compatible API surfaces this through thetool_calls field in streaming chunks. The agent may execute multiple tool-use iterations before responding.
OpenFang automatically handles the full agent loop — tool execution, memory access, security checks, and multi-turn reasoning. From the client’s perspective, you just see streaming text and optional tool call deltas.
Tool Call Streaming Format
When streaming is enabled and the agent invokes tools:- ToolUseStart → Chunk with
tool_callsarray containingid,type, andfunction.name - ToolInputDelta → Incremental chunks with
function.argumentstext - ContentComplete (ToolUse) → Tool execution happens server-side, client sees no explicit marker
- Agent may perform additional tool calls or proceed to final response
- Final text deltas → Agent’s final response text
- Finish chunk →
finish_reason: "stop"
Implementation Details
Message Conversion
OpenFang converts OpenAI message format to its internal representation:- Text content →
MessageContent::Text - Multipart content with images →
MessageContent::BlockswithContentBlock::TextandContentBlock::Image - Role mapping →
user/assistant/system→Role::User/Role::Assistant/Role::System
Response Formatting
<think>tags are automatically stripped from agent responses- Token usage includes the full agent loop (all LLM calls across tool use iterations)
- Agent name is returned as the model identifier (without
openfang:prefix) - Streaming delivers all iterations until the agent loop channel closes
Performance
- Cold start: ~180ms (agent loop initialization)
- Streaming latency: First token typically within 200-500ms depending on LLM provider
- Non-streaming: Full response after agent completes all tool use and reasoning
- Memory overhead: ~40MB base + agent working memory
Limitations
What’s Not Supported
- Function calling (explicit tool definitions in request) — Use agent’s built-in tools instead
- Fine-tuned models — Model field only resolves to agents, not external LLM models
- Logprobs — Not exposed in current version
- Multiple choices (n > 1) — Always returns single completion
- Stop sequences — Not configurable per request
- Presence/frequency penalties — Not exposed through API (agent’s LLM config applies)
What’s Different from OpenAI
- No API key required for local usage (production deployments should add auth)
- Model field is agent identifier not LLM model (agent’s config determines which LLM to use)
- Full agent capabilities — Agents have memory, tools, security, persistent state
- Multi-turn automatic — Agent loop may execute multiple LLM calls transparently
- Tool execution is server-side — Client sees tool call deltas but doesn’t execute tools
Security Considerations
Local Development
By default, OpenFang’s API server binds to127.0.0.1:4200, making it accessible only from localhost.
Production Deployment
If exposing OpenFang’s API to a network or the internet:- Add authentication — Place behind reverse proxy (nginx, Caddy) with API key validation
- Use HTTPS — TLS-terminate at proxy for encrypted transport
- Rate limiting — OpenFang has built-in GCRA rate limiter, configure in
config.toml - Network isolation — Restrict access to trusted IP ranges
- Monitor usage — Enable budget tracking and cost monitoring
OpenFang’s 16-layer security architecture protects agent execution (WASM sandbox, SSRF protection, taint tracking, etc.) but the HTTP API itself requires external authentication for production use.
Next Steps
REST API Reference
Explore OpenFang’s full native REST API
Agent Configuration
Configure agents with tools, memory, and hands
Tool Development
Build custom tools for your agents
Security Architecture
Learn about OpenFang’s 16 security systems