Skip to main content

Overview

Providers abstract LLM API interactions, enabling PicoClaw to work with multiple AI models and services through a unified interface. The provider system supports automatic fallback, load balancing, and zero-code provider addition.

Provider Interface

All providers implement the LLMProvider interface (pkg/providers/types.go):
type LLMProvider interface {
    Chat(
        ctx context.Context,
        messages []Message,
        tools []ToolDefinition,
        model string,
        options map[string]any,
    ) (*LLMResponse, error)
    GetDefaultModel() string
}

type Message struct {
    Role             string      // "system", "user", "assistant", "tool"
    Content          string      // Message content
    ToolCalls        []ToolCall  // Tool calls (for assistant messages)
    ToolCallID       string      // Tool call ID (for tool messages)
    ReasoningContent string      // Reasoning/thoughts
}

type LLMResponse struct {
    Content          string      // Response text
    ToolCalls        []ToolCall  // Requested tool calls
    Reasoning        string      // Model's reasoning
    ReasoningContent string      // Extended reasoning
    Usage            *UsageInfo  // Token usage
}

Supported Providers

HTTP-Compatible Providers

Providers using OpenAI-compatible HTTP API (pkg/providers/http_provider.go):
ProviderPrefixDefault API BaseNotes
OpenAIopenai/https://api.openai.com/v1GPT models
Anthropicanthropic/https://api.anthropic.com/v1Claude (via OpenAI format)
Zhipuzhipu/https://open.bigmodel.cn/api/paas/v4GLM models
DeepSeekdeepseek/https://api.deepseek.com/v1DeepSeek models
Geminigemini/https://generativelanguage.googleapis.com/v1betaGoogle Gemini
Groqgroq/https://api.groq.com/openai/v1Fast inference
Moonshotmoonshot/https://api.moonshot.cn/v1Kimi models
Qwenqwen/https://dashscope.aliyuncs.com/compatible-mode/v1Alibaba Qwen
NVIDIAnvidia/https://integrate.api.nvidia.com/v1NVIDIA models
Ollamaollama/http://localhost:11434/v1Local models
OpenRouteropenrouter/https://openrouter.ai/api/v1Multi-model proxy
LiteLLMlitellm/http://localhost:4000/v1LiteLLM proxy
VLLMvllm/http://localhost:8000/v1vLLM inference
Cerebrascerebras/https://api.cerebras.ai/v1Fast inference

Native Providers

Claude Provider (pkg/providers/claude_provider.go)

Native Anthropic API implementation with:
  • Prompt caching
  • Extended thinking
  • Vision support

Codex Provider (pkg/providers/codex_provider.go)

OpenAI OAuth/token authentication.

GitHub Copilot (pkg/providers/github_copilot_provider.go)

gRPC connection to local GitHub Copilot agent.

Model List Configuration

The modern way to configure providers: zero-code model addition.

Basic Configuration

{
  "model_list": [
    {
      "model_name": "gpt4",
      "model": "openai/gpt-5.2",
      "api_key": "sk-..."
    },
    {
      "model_name": "claude",
      "model": "anthropic/claude-sonnet-4.6",
      "api_key": "sk-ant-..."
    },
    {
      "model_name": "glm",
      "model": "zhipu/glm-4.7",
      "api_key": "..."
    }
  ],
  "agents": {
    "defaults": {
      "model": "gpt4"  // References model_name
    }
  }
}

Model Entry Fields

FieldTypeRequiredDescription
model_namestringYesUnique identifier for this model
modelstringYesFull model ID with vendor prefix
api_keystringNo*API key (*required for most providers)
api_basestringNoOverride default API base
request_timeoutintNoTimeout in seconds (default: 120)

Provider Auto-Detection

Providers are automatically selected based on model prefix:
openai/gpt-5.2       → OpenAI provider
anthropic/claude-*   → Anthropic provider
zhipu/glm-*          → Zhipu provider
ollama/llama3        → Ollama provider (no API key needed)
litellm/custom       → LiteLLM proxy

Custom API Base

Override default endpoints:
{
  "model_list": [
    {
      "model_name": "custom-gpt",
      "model": "openai/gpt-5.2",
      "api_base": "https://my-proxy.com/v1",
      "api_key": "sk-..."
    }
  ]
}

Request Timeout

Set per-model timeout:
{
  "model_list": [
    {
      "model_name": "slow-model",
      "model": "anthropic/claude-opus-4",
      "api_key": "sk-ant-...",
      "request_timeout": 300  // 5 minutes
    }
  ]
}

Load Balancing

Multiple entries with same model_name enable round-robin load balancing:
{
  "model_list": [
    {
      "model_name": "gpt4",
      "model": "openai/gpt-5.2",
      "api_base": "https://api1.example.com/v1",
      "api_key": "key1"
    },
    {
      "model_name": "gpt4",
      "model": "openai/gpt-5.2",
      "api_base": "https://api2.example.com/v1",
      "api_key": "key2"
    }
  ]
}
Behavior:
  • Requests alternate between endpoints
  • Reduces single-endpoint rate limits
  • Improves availability

Fallback Chain

Automatic failover when primary model fails.

Configuration

Method 1: Model-specific
{
  "agents": {
    "defaults": {
      "model": "gpt4",
      "model_fallbacks": ["claude", "glm"]
    }
  }
}
Method 2: Agent-specific
{
  "agents": {
    "agents": [
      {
        "id": "main",
        "model": {
          "primary": "gpt4",
          "fallbacks": ["claude", "glm"]
        }
      }
    ]
  }
}

Fallback Execution

Implemented in pkg/providers/fallback.go:
Try primary model

  ├─ Success → Return response

  └─ Failure → Classify error

        ├─ Retriable? (auth, rate_limit, timeout, billing, overloaded)
        │   │
        │   └─ Try next fallback

        └─ Non-retriable? (format error)

            └─ Return error immediately

Error Classification

Defined in pkg/providers/error_classifier.go:
ReasonRetry?Examples
authYesInvalid API key, expired token
rate_limitYes (with cooldown)429 Too Many Requests
billingYesInsufficient credits, quota exceeded
timeoutYesNetwork timeout, deadline exceeded
overloadedYes503 Service Unavailable
formatNoInvalid image size, unsupported content
unknownYesOther errors

Cooldown Tracking

Prevents rapid retries after rate limit:
type CooldownTracker struct {
    cooldowns map[string]time.Time  // provider:model → cooldown until
}

// Returns true if provider is in cooldown
func (ct *CooldownTracker) IsInCooldown(provider, model string) bool

// Sets cooldown period
func (ct *CooldownTracker) SetCooldown(provider, model string, duration time.Duration)
Default cooldown: 60 seconds for rate limits

Legacy Provider Configuration

Deprecated but still supported for backward compatibility.

Legacy Format

{
  "providers": {
    "zhipu": {
      "api_key": "your-key",
      "api_base": "https://open.bigmodel.cn/api/paas/v4"
    },
    "anthropic": {
      "api_key": "sk-ant-..."
    }
  },
  "agents": {
    "defaults": {
      "provider": "zhipu",
      "model": "glm-4.7"
    }
  }
}

Migration to model_list

Before:
{
  "providers": {
    "zhipu": {
      "api_key": "key",
      "api_base": "https://open.bigmodel.cn/api/paas/v4"
    }
  },
  "agents": {
    "defaults": {
      "provider": "zhipu",
      "model": "glm-4.7"
    }
  }
}
After:
{
  "model_list": [
    {
      "model_name": "glm-4.7",
      "model": "zhipu/glm-4.7",
      "api_key": "key"
    }
  ],
  "agents": {
    "defaults": {
      "model": "glm-4.7"
    }
  }
}

Provider Selection Logic

Implemented in pkg/providers/factory.go:
1. Check for explicit provider in config

   ├─ Found → Use configured provider

   └─ Not found → Infer from model name

       ├─ Model prefix (openai/, anthropic/, etc.)

       ├─ Model name contains keywords (gpt, claude, etc.)

       └─ Fallback to OpenRouter if configured

Provider Resolution Examples

Model: "gpt-5.2"
Config: providers.openai.api_key set
Result: OpenAI provider

Model: "anthropic/claude-sonnet-4.6"
Config: providers.anthropic.api_key set
Result: Anthropic provider

Model: "custom-model"
Config: providers.openrouter.api_key set
Result: OpenRouter provider

Model: "ollama/llama3"
Config: providers.ollama.api_base = "http://localhost:11434/v1"
Result: Ollama provider (no API key needed)

Special Providers

OpenRouter

Universal model router supporting all major providers:
{
  "providers": {
    "openrouter": {
      "api_key": "sk-or-v1-..."
    }
  }
}
Supports:
  • OpenAI (GPT-4, GPT-3.5, etc.)
  • Anthropic (Claude)
  • Google (Gemini)
  • Meta (Llama)
  • And 100+ other models

LiteLLM Proxy

Connect to LiteLLM proxy for unified model access:
{
  "model_list": [
    {
      "model_name": "proxy-gpt4",
      "model": "litellm/gpt-4",
      "api_base": "http://localhost:4000/v1",
      "api_key": "sk-..."
    }
  ]
}
Note: PicoClaw only strips the litellm/ prefix. The rest is passed to the proxy.

Ollama (Local)

Run models locally:
{
  "model_list": [
    {
      "model_name": "llama3",
      "model": "ollama/llama3",
      "api_base": "http://localhost:11434/v1"
    }
  ]
}
No API key required for local Ollama.

GitHub Copilot

Connect to local GitHub Copilot agent:
{
  "providers": {
    "github_copilot": {
      "api_base": "localhost:4321",
      "connect_mode": "grpc"
    }
  },
  "agents": {
    "defaults": {
      "provider": "github_copilot",
      "model": "gpt-4o"
    }
  }
}

Model Routing Patterns

Pattern 1: Single Provider

Simplest setup:
{
  "model_list": [
    {
      "model_name": "main",
      "model": "openai/gpt-5.2",
      "api_key": "sk-..."
    }
  ],
  "agents": {
    "defaults": {"model": "main"}
  }
}

Pattern 2: Multi-Provider Fallback

Automatic failover:
{
  "model_list": [
    {"model_name": "gpt4", "model": "openai/gpt-5.2", "api_key": "sk-1"},
    {"model_name": "claude", "model": "anthropic/claude-sonnet-4.6", "api_key": "sk-2"},
    {"model_name": "glm", "model": "zhipu/glm-4.7", "api_key": "sk-3"}
  ],
  "agents": {
    "defaults": {
      "model": "gpt4",
      "model_fallbacks": ["claude", "glm"]
    }
  }
}

Pattern 3: Load Balanced Primary

Distribute load across endpoints:
{
  "model_list": [
    {"model_name": "gpt4", "model": "openai/gpt-5.2", "api_base": "https://api1.com/v1", "api_key": "k1"},
    {"model_name": "gpt4", "model": "openai/gpt-5.2", "api_base": "https://api2.com/v1", "api_key": "k2"},
    {"model_name": "claude", "model": "anthropic/claude-sonnet-4.6", "api_key": "sk-ant"}
  ],
  "agents": {
    "defaults": {
      "model": "gpt4",
      "model_fallbacks": ["claude"]
    }
  }
}

Pattern 4: Per-Agent Models

Different models for different agents:
{
  "model_list": [
    {"model_name": "fast", "model": "groq/llama-3.1-70b", "api_key": "gsk-"},
    {"model_name": "smart", "model": "anthropic/claude-sonnet-4.6", "api_key": "sk-ant"},
    {"model_name": "cheap", "model": "gemini/gemini-2.0", "api_key": "AIza"}
  ],
  "agents": {
    "defaults": {"model": "fast"},
    "agents": [
      {"id": "main", "model": {"primary": "fast"}},
      {"id": "researcher", "model": {"primary": "smart"}},
      {"id": "cron", "model": {"primary": "cheap"}}
    ]
  }
}

Best Practices

1. Always Configure Fallbacks

Ensure reliability:
{
  "model_fallbacks": ["provider2", "provider3"]
}

2. Use OpenRouter for Flexibility

Single API key for all models:
{
  "providers": {
    "openrouter": {"api_key": "sk-or-..."}
  }
}

3. Set Appropriate Timeouts

Longer timeouts for complex tasks:
{
  "request_timeout": 300  // 5 minutes
}

4. Monitor Rate Limits

Use load balancing for high-volume:
{
  "model_list": [
    {"model_name": "gpt4", "api_key": "key1"},
    {"model_name": "gpt4", "api_key": "key2"}
  ]
}

5. Local Development

Use Ollama for free testing:
{
  "model_list": [
    {"model_name": "local", "model": "ollama/llama3"}
  ]
}

6. Cost Optimization

Cheap models for simple tasks, expensive for complex:
{
  "agents": [
    {"id": "cron", "model": {"primary": "gemini-flash"}},
    {"id": "main", "model": {"primary": "gpt4"}}
  ]
}

Build docs developers (and LLMs) love