OpenAI-Compatible Endpoints

OpenFang implements the OpenAI /v1/chat/completions API, allowing any OpenAI-compatible client library to talk directly to your agents. Your agents become LLM endpoints that any tool, script, or application can use.

All OpenAI-compatible endpoints are available at http://127.0.0.1:4200/v1/ by default. Configure the port in ~/.openfang/config.toml.

How It Works

When you send a request to /v1/chat/completions, OpenFang:

Resolves the agent — The model field maps to an agent by name, UUID, or openfang:<name> prefix
Converts messages — OpenAI message format is converted to OpenFang’s internal format
Executes the agent loop — The agent processes your message with full tool access and multi-turn execution
Returns formatted response — Response follows OpenAI’s exact structure, including token usage

This means your agents get all of OpenFang’s capabilities (tools, memory, security, hands) while being accessible through the standard OpenAI API.

Authentication

Currently, OpenFang’s OpenAI-compatible endpoints do not require authentication when running locally. For production deployments, configure network access controls or place OpenFang behind a reverse proxy with authentication.

Agent Resolution

The model field in your request resolves to an agent using the following priority:

openfang:<name> — Explicit agent name with prefix (e.g., openfang:researcher)
Valid UUID — Direct agent ID lookup (e.g., 550e8400-e29b-41d4-a716-446655440000)
Plain string — Agent name without prefix (e.g., researcher)

If no agent matches, you’ll receive a 404 error with model_not_found code.

curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:researcher",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

POST /v1/chat/completions

Send a message to an agent and receive a completion response. Supports both streaming and non-streaming modes.

Request Body

model

string

required

Agent identifier — name, UUID, or openfang:<name> format

messages

array

required

Array of message objects with role and content fields. Supports user, assistant, and system roles.

stream

boolean

default:"false"

Enable Server-Sent Events (SSE) streaming for real-time token delivery

max_tokens

integer

Maximum tokens to generate (passed to underlying LLM if supported)

temperature

float

Sampling temperature 0.0-2.0 (passed to underlying LLM if supported)

Response

string

Unique request identifier in format chatcmpl-<uuid>

object

string

Always "chat.completion" for non-streaming, "chat.completion.chunk" for streaming

created

integer

Unix timestamp of response creation

model

string

Agent name that processed the request

choices

array

Array containing the completion choice(s)

index

integer

Choice index (always 0 for single completions)

message

object

The assistant’s response message

role

string

Always "assistant"

content

string

The response text (with <think> tags stripped)

tool_calls

array

Tool calls made by the agent (if any)

finish_reason

string

Reason completion stopped: "stop", "length", "tool_calls"

usage

object

Token usage statistics

prompt_tokens

integer

Input tokens consumed

completion_tokens

integer

Output tokens generated

total_tokens

integer

Sum of prompt and completion tokens

Non-Streaming Example

curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:assistant",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Rust?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

{
  "id": "chatcmpl-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "object": "chat.completion",
  "created": 1709654400,
  "model": "assistant",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Rust is a systems programming language focused on safety, speed, and concurrency. It achieves memory safety without garbage collection through its unique ownership system. Rust is commonly used for operating systems, web servers, game engines, and high-performance applications where control and reliability are critical."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 52,
    "total_tokens": 80
  }
}

Streaming Example

When stream: true, the response is delivered as Server-Sent Events (SSE). Each chunk contains incremental deltas.

curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:assistant",
    "messages": [
      {"role": "user", "content": "Count to 5"}
    ],
    "stream": true
  }'

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 3"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 4"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{"content":", 5"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1709654400,"model":"assistant","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Multimodal (Vision) Support

OpenFang supports image inputs via base64-encoded data URIs. Images are passed to the agent’s underlying LLM if it supports vision.

curl http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:vision-analyst",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
            }
          }
        ]
      }
    ]
  }'

Image Support Notes:

Only base64-encoded data URIs are supported (format: data:image/{type};base64,{data})
URL-based images are not supported (would require external fetching)
The agent’s configured LLM must support vision (e.g., Claude 3, GPT-4V, Gemini Pro Vision)

GET /v1/models

List all available agents as OpenAI-compatible model objects. Each agent appears as a model with the openfang:<name> identifier.

Response

object

string

Always "list"

data

array

Array of model objects

string

Model identifier in format openfang:<agent_name>

object

string

Always "model"

created

integer

Unix timestamp when agent was registered

owned_by

string

Always "openfang"

Example

curl http://127.0.0.1:4200/v1/models

{
  "object": "list",
  "data": [
    {
      "id": "openfang:assistant",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:researcher",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:vision-analyst",
      "object": "model",
      "created": 1709654400,
      "owned_by": "openfang"
    }
  ]
}

Error Responses

All errors follow OpenAI’s error format with a structured error object.

Model Not Found (404)

{
  "error": {
    "message": "No agent found for model 'nonexistent-agent'",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Missing User Message (400)

{
  "error": {
    "message": "No user message found in request",
    "type": "invalid_request_error",
    "code": "missing_message"
  }
}

Agent Processing Failed (500)

{
  "error": {
    "message": "Agent processing failed",
    "type": "server_error"
  }
}

Integration Examples

Use with LangChain

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(
    openai_api_base="http://127.0.0.1:4200/v1",
    openai_api_key="not-needed",
    model_name="openfang:assistant"
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What are autonomous agents?")
]

response = llm(messages)
print(response.content)

Use with Cursor AI

Configure Cursor to use OpenFang as an LLM provider:

Open Cursor Settings → Models
Add custom OpenAI-compatible endpoint: http://127.0.0.1:4200/v1
Set model name to openfang:<your-agent-name>
Leave API key empty or set to not-needed

Your OpenFang agents now power Cursor’s code completion and chat.

Use with Continue.dev

Add to your ~/.continue/config.json:

{
  "models": [
    {
      "title": "OpenFang Assistant",
      "provider": "openai",
      "model": "openfang:assistant",
      "apiBase": "http://127.0.0.1:4200/v1",
      "apiKey": "not-needed"
    }
  ]
}

Use with Anything That Supports OpenAI

Any tool, library, or platform that supports custom OpenAI endpoints can use OpenFang:

LM Studio — Add custom endpoint in settings
Jan.ai — Configure remote server with OpenFang URL
Open WebUI — Add as OpenAI-compatible connection
LibreChat — Add as custom endpoint
BetterChatGPT — Configure API endpoint
Chatbox — Add custom OpenAI endpoint

The pattern is always the same:

Base URL: http://127.0.0.1:4200/v1
API Key: not-needed (or leave empty)
Model: openfang:<agent-name> or just agent name

Tool Calls and Multi-Turn Execution

When an agent uses tools during execution, the OpenAI-compatible API surfaces this through the tool_calls field in streaming chunks. The agent may execute multiple tool-use iterations before responding.

OpenFang automatically handles the full agent loop — tool execution, memory access, security checks, and multi-turn reasoning. From the client’s perspective, you just see streaming text and optional tool call deltas.

Tool Call Streaming Format

When streaming is enabled and the agent invokes tools:

ToolUseStart → Chunk with tool_calls array containing id, type, and function.name
ToolInputDelta → Incremental chunks with function.arguments text
ContentComplete (ToolUse) → Tool execution happens server-side, client sees no explicit marker
Agent may perform additional tool calls or proceed to final response
Final text deltas → Agent’s final response text
Finish chunk → finish_reason: "stop"

data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"web_search","arguments":""}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"query\""}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"rust lang"}}]}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"uage\"}"}}]}}]}

// Tool executes server-side, then agent continues...

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Based on"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" the search"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{}},"finish_reason":"stop"}]}

data: [DONE]

Implementation Details

Message Conversion

OpenFang converts OpenAI message format to its internal representation:

Text content → MessageContent::Text
Multipart content with images → MessageContent::Blocks with ContentBlock::Text and ContentBlock::Image
Role mapping → user/assistant/system → Role::User/Role::Assistant/Role::System

Response Formatting

<think> tags are automatically stripped from agent responses
Token usage includes the full agent loop (all LLM calls across tool use iterations)
Agent name is returned as the model identifier (without openfang: prefix)
Streaming delivers all iterations until the agent loop channel closes

Performance

Cold start: ~180ms (agent loop initialization)
Streaming latency: First token typically within 200-500ms depending on LLM provider
Non-streaming: Full response after agent completes all tool use and reasoning
Memory overhead: ~40MB base + agent working memory

Limitations

What’s Not Supported

Function calling (explicit tool definitions in request) — Use agent’s built-in tools instead
Fine-tuned models — Model field only resolves to agents, not external LLM models
Logprobs — Not exposed in current version
Multiple choices (n > 1) — Always returns single completion
Stop sequences — Not configurable per request
Presence/frequency penalties — Not exposed through API (agent’s LLM config applies)

What’s Different from OpenAI

No API key required for local usage (production deployments should add auth)
Model field is agent identifier not LLM model (agent’s config determines which LLM to use)
Full agent capabilities — Agents have memory, tools, security, persistent state
Multi-turn automatic — Agent loop may execute multiple LLM calls transparently
Tool execution is server-side — Client sees tool call deltas but doesn’t execute tools

Security Considerations

Local Development

By default, OpenFang’s API server binds to 127.0.0.1:4200, making it accessible only from localhost.

Production Deployment

If exposing OpenFang’s API to a network or the internet:

Add authentication — Place behind reverse proxy (nginx, Caddy) with API key validation
Use HTTPS — TLS-terminate at proxy for encrypted transport
Rate limiting — OpenFang has built-in GCRA rate limiter, configure in config.toml
Network isolation — Restrict access to trusted IP ranges
Monitor usage — Enable budget tracking and cost monitoring

OpenFang’s 16-layer security architecture protects agent execution (WASM sandbox, SSRF protection, taint tracking, etc.) but the HTTP API itself requires external authentication for production use.

Next Steps

REST API Reference

Explore OpenFang’s full native REST API

Agent Configuration

Configure agents with tools, memory, and hands

Tool Development

Build custom tools for your agents

Security Architecture

Learn about OpenFang’s 16 security systems

REST API

CLI Reference

OpenAI Compatibility

​OpenAI-Compatible Endpoints

​How It Works

​Authentication

​Agent Resolution

​POST /v1/chat/completions

​Request Body

​Response

​Non-Streaming Example

​Streaming Example

​Multimodal (Vision) Support

​GET /v1/models

​Response

​Example

​Error Responses

​Model Not Found (404)

​Missing User Message (400)

​Agent Processing Failed (500)

​Integration Examples

​Use with LangChain

​Use with Cursor AI

​Use with Continue.dev

​Use with Anything That Supports OpenAI

​Tool Calls and Multi-Turn Execution

​Tool Call Streaming Format

​Implementation Details

​Message Conversion

​Response Formatting

​Performance

​Limitations

​What’s Not Supported

​What’s Different from OpenAI

​Security Considerations

​Local Development

​Production Deployment

​Next Steps

REST API Reference

Agent Configuration

Tool Development

Security Architecture

OpenAI-Compatible Endpoints

How It Works

Authentication

Agent Resolution

POST /v1/chat/completions

Request Body

Response

Non-Streaming Example

Streaming Example

Multimodal (Vision) Support

GET /v1/models

Response

Example

Error Responses

Model Not Found (404)

Missing User Message (400)

Agent Processing Failed (500)

Integration Examples

Use with LangChain

Use with Cursor AI

Use with Continue.dev

Use with Anything That Supports OpenAI

Tool Calls and Multi-Turn Execution

Tool Call Streaming Format

Implementation Details

Message Conversion

Response Formatting

Performance

Limitations

What’s Not Supported

What’s Different from OpenAI

Security Considerations

Local Development

Production Deployment

Next Steps