OpenFang provides an OpenAI-compatible API endpoint that allows any OpenAI client library to communicate with OpenFang agents. This enables drop-in integration with tools like Cursor, Continue, Open WebUI, and custom applications.
Base URL
The OpenAI-compatible API is available at:
Configure your OpenAI client to use this base URL instead of https://api.openai.com/v1.
Chat Completions
POST /v1/chat/completions
Send a chat completion request using the OpenAI message format.
curl -X POST http://127.0.0.1:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openfang:coder",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 1024
}'
Model identifier (maps to OpenFang agent):
openfang:<name> — Find agent by name
UUID — Find agent by ID
Plain string — Try as agent name
Any other — Falls back to first registered agent
Chat messages in OpenAI format Message role: system, user, or assistant
Message content (text string or array of content parts)
Enable streaming responses
Temperature (currently ignored, uses agent’s model default)
Max tokens to generate (currently ignored)
Non-streaming response:
Completion ID (format: chatcmpl-{uuid})
Tool invocations (if any)
Finish reason: stop, length, or tool_calls
{
"id" : "chatcmpl-a1b2c3d4-e5f6-7890-abcd-ef1234567890" ,
"object" : "chat.completion" ,
"created" : 1705329600 ,
"model" : "coder" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Hello! How can I help you today?"
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 20 ,
"completion_tokens" : 9 ,
"total_tokens" : 29
}
}
Streaming response:
When "stream": true, the response is a stream of SSE events:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1705329600,"model":"coder","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
List Models
GET /v1/models
List all available agents as OpenAI model objects.
curl http://127.0.0.1:4200/v1/models
Model ID (format: openfang:{agent_name})
{
"object" : "list" ,
"data" : [
{
"id" : "openfang:coder" ,
"object" : "model" ,
"created" : 1705329600 ,
"owned_by" : "openfang"
},
{
"id" : "openfang:assistant" ,
"object" : "model" ,
"created" : 1705329600 ,
"owned_by" : "openfang"
}
]
}
Model Resolution
The model field in chat completions maps to OpenFang agents:
Format Example Behavior openfang:<name>openfang:coderFind agent by name UUID a1b2c3d4-...Find agent by ID Plain string coderTry as agent name Any other gpt-4oFalls back to first registered agent
If no agent is found, the API returns a 404 error with: { "error" : { "message" : "No agent found for model 'gpt-4o'" }}
Image Support
OpenFang supports image inputs via data URIs:
curl -X POST http://127.0.0.1:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openfang:analyst",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgo..."
}
}
]
}
]
}'
Only data URIs are supported. HTTP(S) URLs are not fetched automatically.
When an agent invokes tools, they appear in the response as tool_calls:
{
"choices" : [
{
"message" : {
"role" : "assistant" ,
"content" : null ,
"tool_calls" : [
{
"index" : 0 ,
"id" : "call_abc123" ,
"type" : "function" ,
"function" : {
"name" : "web_search" ,
"arguments" : "{ \" query \" : \" quantum computing \" }"
}
}
]
},
"finish_reason" : "tool_calls"
}
]
}
In streaming mode, tool calls are incrementally streamed:
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"web_search","arguments":""}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"query\""}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":":\"quantum computing\"}"}}]}}]}
Client Configuration
Python (openai package)
from openai import OpenAI
client = OpenAI(
base_url = "http://127.0.0.1:4200/v1" ,
api_key = "dummy" # Not required if OpenFang has no api_key configured
)
response = client.chat.completions.create(
model = "openfang:coder" ,
messages = [
{ "role" : "user" , "content" : "Hello!" }
]
)
print (response.choices[ 0 ].message.content)
JavaScript (openai package)
import OpenAI from 'openai'
const client = new OpenAI ({
baseURL: 'http://127.0.0.1:4200/v1' ,
apiKey: 'dummy'
})
const response = await client . chat . completions . create ({
model: 'openfang:coder' ,
messages: [{ role: 'user' , content: 'Hello!' }]
})
console . log ( response . choices [ 0 ]. message . content )
Cursor IDE
Open Cursor Settings
Navigate to AI → OpenAI API
Set Base URL : http://127.0.0.1:4200/v1
Set API Key : dummy (or leave blank if no auth)
Set Model : openfang:coder
Continue (VS Code extension)
Edit ~/.continue/config.json:
{
"models" : [
{
"title" : "OpenFang Coder" ,
"provider" : "openai" ,
"model" : "openfang:coder" ,
"apiBase" : "http://127.0.0.1:4200/v1" ,
"apiKey" : "dummy"
}
]
}
Open WebUI
Go to Settings → Connections
Add OpenAI API :
Base URL : http://127.0.0.1:4200/v1
API Key : dummy
Select model: openfang:coder
Compatibility Notes
✅ Chat completions (streaming and non-streaming)
✅ List models
✅ System/user/assistant messages
✅ Image inputs (data URIs)
✅ Tool calls (function calling)
✅ Multi-turn conversations
❌ temperature, max_tokens, top_p (ignored, uses agent defaults)
❌ logprobs, top_logprobs (not supported)
❌ seed, logit_bias (not supported)
❌ Embeddings API (/v1/embeddings)
❌ Completions API (/v1/completions)
❌ Fine-tuning API
Differences from OpenAI API
Model names : Use openfang:<agent_name> instead of gpt-4o
Tool execution : Tools are executed automatically (no tool response messages needed)
Agentic loops : Agents may perform multiple iterations internally
Context window : Determined by agent’s underlying LLM model
The OpenAI-compatible API uses the same rate limiting as the rest of OpenFang’s API. If you hit rate limits, responses will return: {
"error" : {
"message" : "Rate limit exceeded" ,
"type" : "rate_limit_error" ,
"code" : "rate_limit_exceeded"
}
}
Drop-in Replacement Guide
Step 1: Start OpenFang
export GROQ_API_KEY = "your-key"
openfang start
Step 2: Spawn an agent
Step 3: Update client configuration
Replace:
client = OpenAI(
api_key = "sk-..."
)
With:
client = OpenAI(
base_url = "http://127.0.0.1:4200/v1" ,
api_key = "dummy"
)
Step 4: Update model names
Replace:
With:
Step 5: Test the integration
response = client.chat.completions.create(
model = "openfang:coder" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
Best Practices
Use descriptive agent names
Create agents with clear, descriptive names: openfang spawn --name python-expert --profile coder
openfang spawn --name research-assistant --profile researcher
Then reference them: model = "openfang:python-expert"
model = "openfang:research-assistant"
Handle tool calls gracefully
Use streaming for long responses
Enable streaming for better UX: stream = client.chat.completions.create(
model = "openfang:coder" ,
messages = [{ "role" : "user" , "content" : "Explain quantum computing" }],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Track token usage via the usage field: response = client.chat.completions.create( ... )
print ( f "Used { response.usage.total_tokens } tokens" )
print ( f "Cost estimate: $ { response.usage.total_tokens * 0.00001 :.4f} " )
Next Steps
Agents API Manage OpenFang agents
Model Catalog Configure LLM models
Usage Tracking Monitor API usage and costs
Authentication Secure your API