Custom Provider Configuration

Forge supports any AI provider that implements OpenAI-compatible or Anthropic-compatible APIs. This includes commercial services, open-source models, and local inference engines.

Overview

Custom providers allow you to:

Use alternative AI services
Run models locally
Connect to self-hosted endpoints
Access specialized model providers
Integrate with enterprise AI platforms

OpenAI-Compatible Providers

Many services implement the OpenAI Chat Completions API format. Forge can work with any such service.

Setup Steps

Get Provider Details

Collect from your provider:

API base URL (e.g., https://api.example.com/v1)
API key or authentication token
Available model names

Configure in Forge

forge provider login

Select OpenAI-Compatible and provide:

OPENAI_URL: Your provider’s base URL
API Key: Your authentication key

Set Model

Configure your model in forge.yaml:

model: your-model-name

Test Connection

forge

Try a simple prompt to verify it works.

Supported Services

Popular OpenAI-compatible services:

Cloud Services

Groq - Ultra-fast inference
Together AI - Open model hosting
Fireworks AI - Production inference
Anyscale Endpoints - Ray-powered serving

Local Inference

Ollama - Easy local deployment
LM Studio - Desktop GUI for local models
llama.cpp - C++ inference engine
vLLM - Fast LLM serving
Jan AI - Privacy-focused desktop app

Example: Groq

Groq provides ultra-fast inference:

forge provider login
# Select: OpenAI-Compatible
# OPENAI_URL: https://api.groq.com/openai/v1
# API Key: gsk_your_groq_api_key

# forge.yaml
model: deepseek-r1-distill-llama-70b

Example: Ollama

Ollama runs models locally:

Install and start Ollama:
```
# Pull a model
ollama pull llama3.2
```

Configure Forge:

forge provider login
# Select: OpenAI-Compatible
# OPENAI_URL: http://localhost:11434/v1
# API Key: (leave empty or use "ollama")

Set model:
```
# forge.yaml
model: llama3.2
```

Example: LM Studio

LM Studio provides a desktop GUI:

Download and start LM Studio
Load a model in the GUI
Start the local server (default port: 1234)

Configure Forge:

forge provider login
# Select: OpenAI-Compatible
# OPENAI_URL: http://localhost:1234/v1
# API Key: lm-studio

Anthropic-Compatible Providers

Some services implement Anthropic’s Messages API format.

Setup Steps

Get Provider Details

Collect:

API base URL
API key
Model names

Configure in Forge

forge provider login

Select Anthropic-Compatible and provide:

ANTHROPIC_URL: Provider base URL
API Key: Authentication key

Set Model

# forge.yaml
model: provider-model-name

Advanced Configuration

Multiple Custom Providers

You can configure multiple custom providers by creating a provider.json file:

[
  {
    "id": "my_custom_provider",
    "api_key_vars": "MY_PROVIDER_API_KEY",
    "url_param_vars": ["MY_PROVIDER_URL"],
    "response_type": "OpenAI",
    "url": "{{MY_PROVIDER_URL}}/chat/completions",
    "models": "{{MY_PROVIDER_URL}}/models",
    "auth_methods": ["api_key"]
  }
]

Place this file at:

~/.config/forge/provider.json (user-wide)
./provider.json (project-specific)

Custom Model Definitions

Define model metadata for custom providers:

[
  {
    "id": "my_provider",
    "api_key_vars": "MY_API_KEY",
    "url": "https://api.example.com/v1/chat/completions",
    "response_type": "OpenAI",
    "models": [
      {
        "id": "my-model-1",
        "name": "My Custom Model",
        "description": "A custom model for specific tasks",
        "context_length": 8192,
        "tools_supported": true,
        "supports_parallel_tool_calls": true,
        "input_modalities": ["text"]
      }
    ],
    "auth_methods": ["api_key"]
  }
]

Local Model Configuration

Ollama (Detailed)

Complete Ollama setup:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3.2
ollama pull codellama
ollama pull mistral

# Verify running
ollama list

Configure in Forge:

forge provider login
# OPENAI_URL: http://localhost:11434/v1
# API Key: ollama

llama.cpp Server

Run llama.cpp server:

# Download llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Download a model (GGUF format)
# Run server
./server -m path/to/model.gguf -c 2048 --host 0.0.0.0 --port 8080

Configure in Forge:

forge provider login
# OPENAI_URL: http://localhost:8080/v1
# API Key: (empty or any value)

vLLM

Deploy vLLM server:

# Install vLLM
pip install vllm

# Start server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-2-7b-chat-hf \
  --port 8000

Configure in Forge:

forge provider login  
# OPENAI_URL: http://localhost:8000/v1
# API Key: (empty or any value)

Troubleshooting

Connection Refused

If Forge can’t connect:

Verify the server is running
Check the URL and port are correct
Ensure no firewall blocking
Test with curl:
```
curl http://localhost:11434/v1/models
```

Invalid Model Name

If model not found:

List available models:

# Ollama
ollama list

# Generic OpenAI-compatible
curl http://localhost:8080/v1/models

Verify spelling in forge.yaml
Check model is loaded/running

Unsupported Features

Some providers may not support:

Tool calling / function calling
Parallel tool execution
Streaming responses
Vision/multimodal input

Check provider documentation for feature support.

Authentication Errors

If authentication fails:

Verify API key is correct
Check if key is required (some local servers don’t need keys)
Try with and without the key
Check provider-specific auth format

Performance Issues

For slow local inference:

Use GPU acceleration if available
Reduce context length
Use quantized models (e.g., GGUF Q4)
Increase server worker threads
Consider cloud providers for production

Deprecated: Environment Variable Setup

Using environment variables is deprecated. Please use forge provider login instead.

For backward compatibility:

# .env
OPENAI_API_KEY=your-api-key
OPENAI_URL=https://api.provider.com/v1

# .env  
ANTHROPIC_API_KEY=your-api-key
ANTHROPIC_URL=https://api.provider.com

Best Practices

Security

Never expose local inference servers to the internet without authentication.

Use authentication even for local servers
Keep API keys in secure storage
Use HTTPS in production
Implement rate limiting

Performance

Local Inference:

Use GPU when available (CUDA, Metal, ROCm)
Choose appropriate quantization (Q4, Q5, Q8)
Tune context length to your needs
Monitor memory usage

Cloud Services:

Choose providers near your location
Monitor response times
Implement caching when possible
Use streaming for long responses

Cost Management

Local Models:

Free inference (after hardware cost)
Pay only for electricity
No rate limits
Full privacy

Cloud Providers:

Compare per-token pricing
Monitor usage carefully
Set spending limits
Use cheaper models when appropriate

Next Steps

Explore Ollama models
Try Groq for fast inference
Set up monitoring for your custom provider
Configure retry logic for reliability

Getting Started

Core Concepts

Configuration

Providers

Features

Advanced Usage

Guides

Custom Provider Configuration

Overview

OpenAI-Compatible Providers

Setup Steps

Supported Services

Cloud Services

Local Inference

Example: Groq

Example: Ollama

Example: LM Studio

Anthropic-Compatible Providers

Setup Steps

Advanced Configuration

Multiple Custom Providers

Custom Model Definitions

Local Model Configuration

Ollama (Detailed)

llama.cpp Server

vLLM

Troubleshooting

Connection Refused

Invalid Model Name

Unsupported Features

Authentication Errors

Performance Issues

Deprecated: Environment Variable Setup

Best Practices

Security

Performance

Cost Management

Next Steps

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Providers

Features

Advanced Usage

Guides

​Overview

​OpenAI-Compatible Providers

​Setup Steps

​Supported Services

​Cloud Services

​Local Inference

​Example: Groq

​Example: Ollama

​Example: LM Studio

​Anthropic-Compatible Providers

​Setup Steps

​Advanced Configuration

​Multiple Custom Providers

​Custom Model Definitions

​Local Model Configuration

​Ollama (Detailed)

​llama.cpp Server

​vLLM

​Troubleshooting

​Connection Refused

​Invalid Model Name

​Unsupported Features

​Authentication Errors

​Performance Issues

​Deprecated: Environment Variable Setup

​Best Practices

​Security

​Performance

​Cost Management

​Next Steps

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible Providers

Setup Steps

Supported Services

Cloud Services

Local Inference

Example: Groq

Example: Ollama

Example: LM Studio

Anthropic-Compatible Providers

Setup Steps

Advanced Configuration

Multiple Custom Providers

Custom Model Definitions

Local Model Configuration

Ollama (Detailed)

llama.cpp Server

vLLM

Troubleshooting

Connection Refused

Invalid Model Name

Unsupported Features

Authentication Errors

Performance Issues

Deprecated: Environment Variable Setup

Best Practices

Security

Performance

Cost Management

Next Steps