Skip to main content
OpenFang provides an OpenAI-compatible API endpoint that allows any OpenAI client library to communicate with OpenFang agents. This enables drop-in integration with tools like Cursor, Continue, Open WebUI, and custom applications.

Base URL

The OpenAI-compatible API is available at:
http://127.0.0.1:4200/v1
Configure your OpenAI client to use this base URL instead of https://api.openai.com/v1.

Chat Completions

POST /v1/chat/completions

Send a chat completion request using the OpenAI message format.
curl -X POST http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:coder",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "stream": false,
    "temperature": 0.7,
    "max_tokens": 1024
  }'
model
string
required
Model identifier (maps to OpenFang agent):
  • openfang:<name> — Find agent by name
  • UUID — Find agent by ID
  • Plain string — Try as agent name
  • Any other — Falls back to first registered agent
messages
array
required
Chat messages in OpenAI format
stream
boolean
default:"false"
Enable streaming responses
temperature
number
default:"0.7"
Temperature (currently ignored, uses agent’s model default)
max_tokens
number
Max tokens to generate (currently ignored)
Non-streaming response:
id
string
Completion ID (format: chatcmpl-{uuid})
object
string
Always "chat.completion"
created
integer
Unix timestamp
model
string
Agent name
choices
array
usage
object
prompt_tokens
integer
Input tokens
completion_tokens
integer
Output tokens
total_tokens
integer
Total tokens
{
  "id": "chatcmpl-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "object": "chat.completion",
  "created": 1705329600,
  "model": "coder",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 9,
    "total_tokens": 29
  }
}
Streaming response: When "stream": true, the response is a stream of SSE events:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

List Models

GET /v1/models

List all available agents as OpenAI model objects.
curl http://127.0.0.1:4200/v1/models
object
string
Always "list"
data
array
{
  "object": "list",
  "data": [
    {
      "id": "openfang:coder",
      "object": "model",
      "created": 1705329600,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:assistant",
      "object": "model",
      "created": 1705329600,
      "owned_by": "openfang"
    }
  ]
}

Model Resolution

The model field in chat completions maps to OpenFang agents:
FormatExampleBehavior
openfang:<name>openfang:coderFind agent by name
UUIDa1b2c3d4-...Find agent by ID
Plain stringcoderTry as agent name
Any othergpt-4oFalls back to first registered agent
If no agent is found, the API returns a 404 error with:
{"error": {"message": "No agent found for model 'gpt-4o'"}}

Image Support

OpenFang supports image inputs via data URIs:
curl -X POST http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:analyst",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,iVBORw0KGgo..."
            }
          }
        ]
      }
    ]
  }'
Only data URIs are supported. HTTP(S) URLs are not fetched automatically.

Tool Calls

When an agent invokes tools, they appear in the response as tool_calls:
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "index": 0,
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "web_search",
              "arguments": "{\"query\":\"quantum computing\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
In streaming mode, tool calls are incrementally streamed:
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"web_search","arguments":""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"query\""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"quantum computing\"}"}}]}}]}

Client Configuration

Python (openai package)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:4200/v1",
    api_key="dummy"  # Not required if OpenFang has no api_key configured
)

response = client.chat.completions.create(
    model="openfang:coder",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript (openai package)

import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'http://127.0.0.1:4200/v1',
  apiKey: 'dummy'
})

const response = await client.chat.completions.create({
  model: 'openfang:coder',
  messages: [{role: 'user', content: 'Hello!'}]
})

console.log(response.choices[0].message.content)

Cursor IDE

  1. Open Cursor Settings
  2. Navigate to AIOpenAI API
  3. Set Base URL: http://127.0.0.1:4200/v1
  4. Set API Key: dummy (or leave blank if no auth)
  5. Set Model: openfang:coder

Continue (VS Code extension)

Edit ~/.continue/config.json:
{
  "models": [
    {
      "title": "OpenFang Coder",
      "provider": "openai",
      "model": "openfang:coder",
      "apiBase": "http://127.0.0.1:4200/v1",
      "apiKey": "dummy"
    }
  ]
}

Open WebUI

  1. Go to SettingsConnections
  2. Add OpenAI API:
    • Base URL: http://127.0.0.1:4200/v1
    • API Key: dummy
  3. Select model: openfang:coder

Compatibility Notes

  • ✅ Chat completions (streaming and non-streaming)
  • ✅ List models
  • ✅ System/user/assistant messages
  • ✅ Image inputs (data URIs)
  • ✅ Tool calls (function calling)
  • ✅ Multi-turn conversations
  • temperature, max_tokens, top_p (ignored, uses agent defaults)
  • logprobs, top_logprobs (not supported)
  • seed, logit_bias (not supported)
  • ❌ Embeddings API (/v1/embeddings)
  • ❌ Completions API (/v1/completions)
  • ❌ Fine-tuning API
  • Model names: Use openfang:<agent_name> instead of gpt-4o
  • Tool execution: Tools are executed automatically (no tool response messages needed)
  • Agentic loops: Agents may perform multiple iterations internally
  • Context window: Determined by agent’s underlying LLM model
The OpenAI-compatible API uses the same rate limiting as the rest of OpenFang’s API. If you hit rate limits, responses will return:
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Drop-in Replacement Guide

Step 1: Start OpenFang

export GROQ_API_KEY="your-key"
openfang start

Step 2: Spawn an agent

openfang spawn coder

Step 3: Update client configuration

Replace:
client = OpenAI(
    api_key="sk-..."
)
With:
client = OpenAI(
    base_url="http://127.0.0.1:4200/v1",
    api_key="dummy"
)

Step 4: Update model names

Replace:
model="gpt-4o"
With:
model="openfang:coder"

Step 5: Test the integration

response = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Best Practices

Create agents with clear, descriptive names:
openfang spawn --name python-expert --profile coder
openfang spawn --name research-assistant --profile researcher
Then reference them:
model="openfang:python-expert"
model="openfang:research-assistant"
When agents use tools, finish_reason will be "tool_calls". The tools are executed automatically, and the final response will have finish_reason: "stop":
response = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Read README.md"}]
)
# Tool execution happens automatically
print(response.choices[0].message.content)
Enable streaming for better UX:
stream = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Track token usage via the usage field:
response = client.chat.completions.create(...)
print(f"Used {response.usage.total_tokens} tokens")
print(f"Cost estimate: ${response.usage.total_tokens * 0.00001:.4f}")

Next Steps

Agents API

Manage OpenFang agents

Model Catalog

Configure LLM models

Usage Tracking

Monitor API usage and costs

Authentication

Secure your API

Build docs developers (and LLMs) love