Skip to main content

OpenAI-Compatible Endpoints

OpenFang implements the OpenAI /v1/chat/completions API, allowing any OpenAI-compatible client library to talk directly to your agents. Your agents become LLM endpoints that any tool, script, or application can use.
All OpenAI-compatible endpoints are available at http://127.0.0.1:4200/v1/ by default. Configure the port in ~/.openfang/config.toml.

How It Works

When you send a request to /v1/chat/completions, OpenFang:
  1. Resolves the agent — The model field maps to an agent by name, UUID, or openfang:<name> prefix
  2. Converts messages — OpenAI message format is converted to OpenFang’s internal format
  3. Executes the agent loop — The agent processes your message with full tool access and multi-turn execution
  4. Returns formatted response — Response follows OpenAI’s exact structure, including token usage
This means your agents get all of OpenFang’s capabilities (tools, memory, security, hands) while being accessible through the standard OpenAI API.

Authentication

Currently, OpenFang’s OpenAI-compatible endpoints do not require authentication when running locally. For production deployments, configure network access controls or place OpenFang behind a reverse proxy with authentication.

Agent Resolution

The model field in your request resolves to an agent using the following priority:
  1. openfang:<name> — Explicit agent name with prefix (e.g., openfang:researcher)
  2. Valid UUID — Direct agent ID lookup (e.g., 550e8400-e29b-41d4-a716-446655440000)
  3. Plain string — Agent name without prefix (e.g., researcher)
If no agent matches, you’ll receive a 404 error with model_not_found code.
curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:researcher",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

POST /v1/chat/completions

Send a message to an agent and receive a completion response. Supports both streaming and non-streaming modes.

Request Body

model
string
required
Agent identifier — name, UUID, or openfang:<name> format
messages
array
required
Array of message objects with role and content fields. Supports user, assistant, and system roles.
stream
boolean
default:"false"
Enable Server-Sent Events (SSE) streaming for real-time token delivery
max_tokens
integer
Maximum tokens to generate (passed to underlying LLM if supported)
temperature
float
Sampling temperature 0.0-2.0 (passed to underlying LLM if supported)

Response

id
string
Unique request identifier in format chatcmpl-<uuid>
object
string
Always "chat.completion" for non-streaming, "chat.completion.chunk" for streaming
created
integer
Unix timestamp of response creation
model
string
Agent name that processed the request
choices
array
Array containing the completion choice(s)
index
integer
Choice index (always 0 for single completions)
message
object
The assistant’s response message
role
string
Always "assistant"
content
string
The response text (with <think> tags stripped)
tool_calls
array
Tool calls made by the agent (if any)
finish_reason
string
Reason completion stopped: "stop", "length", "tool_calls"
usage
object
Token usage statistics
prompt_tokens
integer
Input tokens consumed
completion_tokens
integer
Output tokens generated
total_tokens
integer
Sum of prompt and completion tokens

Non-Streaming Example

curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:assistant",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Rust?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'
{
  "id": "chatcmpl-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "object": "chat.completion",
  "created": 1709654400,
  "model": "assistant",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Rust is a systems programming language focused on safety, speed, and concurrency. It achieves memory safety without garbage collection through its unique ownership system. Rust is commonly used for operating systems, web servers, game engines, and high-performance applications where control and reliability are critical."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 52,
    "total_tokens": 80
  }
}

Streaming Example

When stream: true, the response is delivered as Server-Sent Events (SSE). Each chunk contains incremental deltas.
curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:assistant",
    "messages": [
      {"role": "user", "content": "Count to 5"}
    ],
    "stream": true
  }'
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 3"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 4"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 5"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Multimodal (Vision) Support

OpenFang supports image inputs via base64-encoded data URIs. Images are passed to the agent’s underlying LLM if it supports vision.
curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:vision-analyst",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
            }
          }
        ]
      }
    ]
  }'
Image Support Notes:
  • Only base64-encoded data URIs are supported (format: data:image/{type};base64,{data})
  • URL-based images are not supported (would require external fetching)
  • The agent’s configured LLM must support vision (e.g., Claude 3, GPT-4V, Gemini Pro Vision)

GET /v1/models

List all available agents as OpenAI-compatible model objects. Each agent appears as a model with the openfang:<name> identifier.

Response

object
string
Always "list"
data
array
Array of model objects
id
string
Model identifier in format openfang:<agent_name>
object
string
Always "model"
created
integer
Unix timestamp when agent was registered
owned_by
string
Always "openfang"

Example

curl http://127.0.0.1:4200/v1/models
{
  "object": "list",
  "data": [
    {
      "id": "openfang:assistant",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:researcher",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:vision-analyst",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    }
  ]
}

Error Responses

All errors follow OpenAI’s error format with a structured error object.

Model Not Found (404)

{
  "error": {
    "message": "No agent found for model 'nonexistent-agent'",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Missing User Message (400)

{
  "error": {
    "message": "No user message found in request",
    "type": "invalid_request_error",
    "code": "missing_message"
  }
}

Agent Processing Failed (500)

{
  "error": {
    "message": "Agent processing failed",
    "type": "server_error"
  }
}

Integration Examples

Use with LangChain

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(
    openai_api_base="http://127.0.0.1:4200/v1",
    openai_api_key="not-needed",
    model_name="openfang:assistant"
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What are autonomous agents?")
]

response = llm(messages)
print(response.content)

Use with Cursor AI

Configure Cursor to use OpenFang as an LLM provider:
  1. Open Cursor Settings → Models
  2. Add custom OpenAI-compatible endpoint: http://127.0.0.1:4200/v1
  3. Set model name to openfang:<your-agent-name>
  4. Leave API key empty or set to not-needed
Your OpenFang agents now power Cursor’s code completion and chat.

Use with Continue.dev

Add to your ~/.continue/config.json:
{
  "models": [
    {
      "title": "OpenFang Assistant",
      "provider": "openai",
      "model": "openfang:assistant",
      "apiBase": "http://127.0.0.1:4200/v1",
      "apiKey": "not-needed"
    }
  ]
}

Use with Anything That Supports OpenAI

Any tool, library, or platform that supports custom OpenAI endpoints can use OpenFang:
  • LM Studio — Add custom endpoint in settings
  • Jan.ai — Configure remote server with OpenFang URL
  • Open WebUI — Add as OpenAI-compatible connection
  • LibreChat — Add as custom endpoint
  • BetterChatGPT — Configure API endpoint
  • Chatbox — Add custom OpenAI endpoint
The pattern is always the same:
  • Base URL: http://127.0.0.1:4200/v1
  • API Key: not-needed (or leave empty)
  • Model: openfang:<agent-name> or just agent name

Tool Calls and Multi-Turn Execution

When an agent uses tools during execution, the OpenAI-compatible API surfaces this through the tool_calls field in streaming chunks. The agent may execute multiple tool-use iterations before responding.
OpenFang automatically handles the full agent loop — tool execution, memory access, security checks, and multi-turn reasoning. From the client’s perspective, you just see streaming text and optional tool call deltas.

Tool Call Streaming Format

When streaming is enabled and the agent invokes tools:
  1. ToolUseStart → Chunk with tool_calls array containing id, type, and function.name
  2. ToolInputDelta → Incremental chunks with function.arguments text
  3. ContentComplete (ToolUse) → Tool execution happens server-side, client sees no explicit marker
  4. Agent may perform additional tool calls or proceed to final response
  5. Final text deltas → Agent’s final response text
  6. Finish chunkfinish_reason: "stop"
data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"web_search","arguments":""}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"query\""}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"rust lang"}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"uage\"}"}}]}}]}

// Tool executes server-side, then agent continues...

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Based on"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" the search"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{}},"finish_reason":"stop"}]}

data: [DONE]

Implementation Details

Message Conversion

OpenFang converts OpenAI message format to its internal representation:
  • Text contentMessageContent::Text
  • Multipart content with imagesMessageContent::Blocks with ContentBlock::Text and ContentBlock::Image
  • Role mappinguser/assistant/systemRole::User/Role::Assistant/Role::System

Response Formatting

  • <think> tags are automatically stripped from agent responses
  • Token usage includes the full agent loop (all LLM calls across tool use iterations)
  • Agent name is returned as the model identifier (without openfang: prefix)
  • Streaming delivers all iterations until the agent loop channel closes

Performance

  • Cold start: ~180ms (agent loop initialization)
  • Streaming latency: First token typically within 200-500ms depending on LLM provider
  • Non-streaming: Full response after agent completes all tool use and reasoning
  • Memory overhead: ~40MB base + agent working memory

Limitations

What’s Not Supported

  • Function calling (explicit tool definitions in request) — Use agent’s built-in tools instead
  • Fine-tuned models — Model field only resolves to agents, not external LLM models
  • Logprobs — Not exposed in current version
  • Multiple choices (n > 1) — Always returns single completion
  • Stop sequences — Not configurable per request
  • Presence/frequency penalties — Not exposed through API (agent’s LLM config applies)

What’s Different from OpenAI

  • No API key required for local usage (production deployments should add auth)
  • Model field is agent identifier not LLM model (agent’s config determines which LLM to use)
  • Full agent capabilities — Agents have memory, tools, security, persistent state
  • Multi-turn automatic — Agent loop may execute multiple LLM calls transparently
  • Tool execution is server-side — Client sees tool call deltas but doesn’t execute tools

Security Considerations

Local Development

By default, OpenFang’s API server binds to 127.0.0.1:4200, making it accessible only from localhost.

Production Deployment

If exposing OpenFang’s API to a network or the internet:
  1. Add authentication — Place behind reverse proxy (nginx, Caddy) with API key validation
  2. Use HTTPS — TLS-terminate at proxy for encrypted transport
  3. Rate limiting — OpenFang has built-in GCRA rate limiter, configure in config.toml
  4. Network isolation — Restrict access to trusted IP ranges
  5. Monitor usage — Enable budget tracking and cost monitoring
OpenFang’s 16-layer security architecture protects agent execution (WASM sandbox, SSRF protection, taint tracking, etc.) but the HTTP API itself requires external authentication for production use.

Next Steps

REST API Reference

Explore OpenFang’s full native REST API

Agent Configuration

Configure agents with tools, memory, and hands

Tool Development

Build custom tools for your agents

Security Architecture

Learn about OpenFang’s 16 security systems