OpenAI-Compatible API

OpenFang provides an OpenAI-compatible API endpoint that allows any OpenAI client library to communicate with OpenFang agents. This enables drop-in integration with tools like Cursor, Continue, Open WebUI, and custom applications.

Base URL

The OpenAI-compatible API is available at:

http://127.0.0.1:4200/v1

Configure your OpenAI client to use this base URL instead of https://api.openai.com/v1.

Chat Completions

POST /v1/chat/completions

Send a chat completion request using the OpenAI message format.

curl -X POST http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:coder",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "stream": false,
    "temperature": 0.7,
    "max_tokens": 1024
  }'

model

string

required

Model identifier (maps to OpenFang agent):

openfang:<name> — Find agent by name
UUID — Find agent by ID
Plain string — Try as agent name
Any other — Falls back to first registered agent

messages

array

required

Chat messages in OpenAI format

Show Message object

role

string

required

Message role: system, user, or assistant

content

string | array

required

Message content (text string or array of content parts)

stream

boolean

default:"false"

Enable streaming responses

temperature

number

default:"0.7"

Temperature (currently ignored, uses agent’s model default)

max_tokens

number

Max tokens to generate (currently ignored)

Non-streaming response:

string

Completion ID (format: chatcmpl-{uuid})

object

string

Always "chat.completion"

created

integer

Unix timestamp

model

string

Agent name

choices

array

Show Choice object

index

integer

Choice index (always 0)

message

object

role

string

Always "assistant"

content

string

Generated text

tool_calls

array

Tool invocations (if any)

finish_reason

string

Finish reason: stop, length, or tool_calls

usage

object

prompt_tokens

integer

Input tokens

completion_tokens

integer

Output tokens

total_tokens

integer

Total tokens

{
  "id": "chatcmpl-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "object": "chat.completion",
  "created": 1705329600,
  "model": "coder",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 9,
    "total_tokens": 29
  }
}

Streaming response: When "stream": true, the response is a stream of SSE events:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

List Models

GET /v1/models

List all available agents as OpenAI model objects.

curl http://127.0.0.1:4200/v1/models

object

string

Always "list"

data

array

Show Model object

string

Model ID (format: openfang:{agent_name})

object

string

Always "model"

created

integer

Unix timestamp

owned_by

string

Always "openfang"

{
  "object": "list",
  "data": [
    {
      "id": "openfang:coder",
      "object": "model",
      "created": 1705329600,
      "owned_by": "openfang"
    },
    {
      "id": "openfang:assistant",
      "object": "model",
      "created": 1705329600,
      "owned_by": "openfang"
    }
  ]
}

Model Resolution

The model field in chat completions maps to OpenFang agents:

Format	Example	Behavior
`openfang:<name>`	`openfang:coder`	Find agent by name
UUID	`a1b2c3d4-...`	Find agent by ID
Plain string	`coder`	Try as agent name
Any other	`gpt-4o`	Falls back to first registered agent

If no agent is found, the API returns a 404 error with:

{"error": {"message": "No agent found for model 'gpt-4o'"}}

Image Support

OpenFang supports image inputs via data URIs:

curl -X POST http://127.0.0.1:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openfang:analyst",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,iVBORw0KGgo..."
            }
          }
        ]
      }
    ]
  }'

Only data URIs are supported. HTTP(S) URLs are not fetched automatically.

Tool Calls

When an agent invokes tools, they appear in the response as tool_calls:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "index": 0,
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "web_search",
              "arguments": "{\"query\":\"quantum computing\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

In streaming mode, tool calls are incrementally streamed:

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"web_search","arguments":""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"query\""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"quantum computing\"}"}}]}}]}

Client Configuration

Python (openai package)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:4200/v1",
    api_key="dummy"  # Not required if OpenFang has no api_key configured
)

response = client.chat.completions.create(
    model="openfang:coder",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript (openai package)

import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'http://127.0.0.1:4200/v1',
  apiKey: 'dummy'
})

const response = await client.chat.completions.create({
  model: 'openfang:coder',
  messages: [{role: 'user', content: 'Hello!'}]
})

console.log(response.choices[0].message.content)

Cursor IDE

Open Cursor Settings
Navigate to AI → OpenAI API
Set Base URL: http://127.0.0.1:4200/v1
Set API Key: dummy (or leave blank if no auth)
Set Model: openfang:coder

Continue (VS Code extension)

Edit ~/.continue/config.json:

{
  "models": [
    {
      "title": "OpenFang Coder",
      "provider": "openai",
      "model": "openfang:coder",
      "apiBase": "http://127.0.0.1:4200/v1",
      "apiKey": "dummy"
    }
  ]
}

Open WebUI

Go to Settings → Connections
Add OpenAI API:
- Base URL: http://127.0.0.1:4200/v1
- API Key: dummy
Select model: openfang:coder

Compatibility Notes

Supported features

✅ Chat completions (streaming and non-streaming)
✅ List models
✅ System/user/assistant messages
✅ Image inputs (data URIs)
✅ Tool calls (function calling)
✅ Multi-turn conversations

Unsupported features

❌ temperature, max_tokens, top_p (ignored, uses agent defaults)
❌ logprobs, top_logprobs (not supported)
❌ seed, logit_bias (not supported)
❌ Embeddings API (/v1/embeddings)
❌ Completions API (/v1/completions)
❌ Fine-tuning API

Differences from OpenAI API

Model names: Use openfang:<agent_name> instead of gpt-4o
Tool execution: Tools are executed automatically (no tool response messages needed)
Agentic loops: Agents may perform multiple iterations internally
Context window: Determined by agent’s underlying LLM model

Rate limiting

The OpenAI-compatible API uses the same rate limiting as the rest of OpenFang’s API. If you hit rate limits, responses will return:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Drop-in Replacement Guide

Step 1: Start OpenFang

export GROQ_API_KEY="your-key"
openfang start

Step 2: Spawn an agent

openfang spawn coder

Step 3: Update client configuration

Replace:

client = OpenAI(
    api_key="sk-..."
)

With:

client = OpenAI(
    base_url="http://127.0.0.1:4200/v1",
    api_key="dummy"
)

Step 4: Update model names

Replace:

model="gpt-4o"

With:

model="openfang:coder"

Step 5: Test the integration

response = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Best Practices

Use descriptive agent names

Create agents with clear, descriptive names:

openfang spawn --name python-expert --profile coder
openfang spawn --name research-assistant --profile researcher

Then reference them:

model="openfang:python-expert"
model="openfang:research-assistant"

Handle tool calls gracefully

When agents use tools, finish_reason will be "tool_calls". The tools are executed automatically, and the final response will have finish_reason: "stop":

response = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Read README.md"}]
)
# Tool execution happens automatically
print(response.choices[0].message.content)

Use streaming for long responses

Enable streaming for better UX:

stream = client.chat.completions.create(
    model="openfang:coder",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Monitor token usage

Track token usage via the usage field:

response = client.chat.completions.create(...)
print(f"Used {response.usage.total_tokens} tokens")
print(f"Cost estimate: ${response.usage.total_tokens * 0.00001:.4f}")

Next Steps

Agents API

Manage OpenFang agents

Model Catalog

Configure LLM models

Usage Tracking

Monitor API usage and costs

Authentication

Secure your API

Overview

Core Endpoints

Tools & Extensions

OpenAI Compatible

OpenAI-Compatible API

Base URL

Chat Completions

POST /v1/chat/completions

List Models

GET /v1/models

Model Resolution

Image Support

Tool Calls

Client Configuration

Python (openai package)

JavaScript (openai package)

Cursor IDE

Continue (VS Code extension)

Open WebUI

Compatibility Notes

Drop-in Replacement Guide

Step 1: Start OpenFang

Step 2: Spawn an agent

Step 3: Update client configuration

Step 4: Update model names

Step 5: Test the integration

Best Practices

Next Steps

Agents API

Model Catalog

Usage Tracking

Authentication

Build docs developers (and LLMs) love

Overview

Core Endpoints

Tools & Extensions

OpenAI Compatible

​Base URL

​Chat Completions

​POST /v1/chat/completions

​List Models

​GET /v1/models

​Model Resolution

​Image Support

​Tool Calls

​Client Configuration

​Python (openai package)

​JavaScript (openai package)

​Cursor IDE

​Continue (VS Code extension)

​Open WebUI

​Compatibility Notes

​Drop-in Replacement Guide

​Step 1: Start OpenFang

​Step 2: Spawn an agent

​Step 3: Update client configuration

​Step 4: Update model names

​Step 5: Test the integration

​Best Practices

​Next Steps

Agents API

Model Catalog

Usage Tracking

Authentication

Build docs developers (and LLMs) love

Base URL

Chat Completions

POST /v1/chat/completions

List Models

GET /v1/models

Model Resolution

Image Support

Tool Calls

Client Configuration

Python (openai package)

JavaScript (openai package)

Cursor IDE

Continue (VS Code extension)

Open WebUI

Compatibility Notes

Drop-in Replacement Guide

Step 1: Start OpenFang

Step 2: Spawn an agent

Step 3: Update client configuration

Step 4: Update model names

Step 5: Test the integration

Best Practices

Next Steps