LLM Provider Setup

Overview

The Hive framework uses LiteLLM to provide unified access to multiple LLM providers through a single interface. This allows you to switch between providers seamlessly without changing your agent code.

Supported Providers

Anthropic

Claude Opus, Sonnet, Haiku models with extended context

OpenAI

GPT-4o, GPT-4 Turbo, GPT-3.5, o1 reasoning models

Google

Gemini Pro, Gemini Flash with multimodal support

DeepSeek

DeepSeek Chat, Coder, Reasoner models

Groq

Ultra-fast inference with Llama, Mixtral models

Cerebras

Fast inference with GLM and Qwen models

Quick Setup via Quickstart

The interactive quickstart script guides you through provider configuration:

bash quickstart.sh

You’ll be prompted to choose from:

Subscription Modes (No API Key Purchase)

Claude Code Subscription

Use your Claude Max/Pro plan for API access.Setup: Run claude CLI to authenticate, then select option 1 in quickstart.Models: claude-opus-4-6, claude-sonnet-4-5-20250929

ZAI Code Subscription

Use your ZAI Code plan for API access.Setup: Provide ZAI API key when prompted.Models: glm-5 (32K context)

OpenAI Codex Subscription

Use your ChatGPT Plus plan for API access.Setup: Authenticate via OAuth when prompted.Models: gpt-5.3-codex

API Key Providers

Anthropic (Recommended)

Get API key: https://console.anthropic.com/settings/keysModels: claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5

OpenAI

Get API key: https://platform.openai.com/api-keysModels: gpt-5.2, gpt-5-mini, gpt-4o, gpt-4-turbo

Google Gemini (Free Tier)

Get API key: https://aistudio.google.com/apikeyModels: gemini-3-flash-preview, gemini-3.1-pro-preview

Groq (Fast, Free Tier)

Get API key: https://console.groq.com/keysModels: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-120b

Cerebras (Fast, Free Tier)

Get API key: https://cloud.cerebras.ai/Models: zai-glm-4.7, qwen3-235b-a22b-instruct-2507

Manual Configuration

Set Environment Variables

Add your API key to your shell configuration:

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Google Gemini
export GEMINI_API_KEY="AI..."

# Groq
export GROQ_API_KEY="gsk_..."

# Cerebras
export CEREBRAS_API_KEY="csk-..."

# DeepSeek
export DEEPSEEK_API_KEY="sk-..."

Add to ~/.bashrc or ~/.zshrc for persistence:

echo 'export ANTHROPIC_API_KEY="your-key"' >> ~/.bashrc
source ~/.bashrc

Create Configuration File

Create ~/.hive/configuration.json:

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-opus-4-6",
    "max_tokens": 32768,
    "api_key_env_var": "ANTHROPIC_API_KEY"
  },
  "created_at": "2026-03-03T00:00:00+00:00"
}

Provider-Specific Setup

Anthropic (Claude)

API Key
Claude Code Subscription

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-opus-4-6",
    "max_tokens": 32768,
    "api_key_env_var": "ANTHROPIC_API_KEY"
  }
}

Available Models:

claude-opus-4-6 - Most capable (recommended)
claude-sonnet-4-5-20250929 - Best balance
claude-sonnet-4-20250514 - Fast + capable
claude-haiku-4-5-20251001 - Fast + cheap

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-opus-4-6",
    "max_tokens": 32768,
    "use_claude_code_subscription": true
  }
}

Setup: Authenticate with claude CLI first:

claude

Token auto-refresh from ~/.claude/.credentials.json.

OpenAI

API Key
Codex Subscription

{
  "llm": {
    "provider": "openai",
    "model": "gpt-5.2",
    "max_tokens": 16384,
    "api_key_env_var": "OPENAI_API_KEY"
  }
}

Available Models:

gpt-5.2 - Most capable (recommended)
gpt-5-mini - Fast + cheap
gpt-4o - Multimodal flagship
gpt-4-turbo - Fast GPT-4
o1 - Reasoning model

{
  "llm": {
    "provider": "openai",
    "model": "gpt-5.3-codex",
    "max_tokens": 16384,
    "use_codex_subscription": true
  }
}

Setup: OAuth authentication via quickstart or:

uv run python core/codex_oauth.py

Google Gemini

{
  "llm": {
    "provider": "gemini",
    "model": "gemini-3-flash-preview",
    "max_tokens": 8192,
    "api_key_env_var": "GEMINI_API_KEY"
  }
}

Available Models:

gemini-3-flash-preview - Fast (recommended)
gemini-3.1-pro-preview - Best quality
gemini-1.5-pro - Extended context (2M tokens)

DeepSeek

{
  "llm": {
    "provider": "deepseek",
    "model": "deepseek-chat",
    "max_tokens": 8192,
    "api_key_env_var": "DEEPSEEK_API_KEY"
  }
}

Available Models:

deepseek-chat - General purpose
deepseek-coder - Code generation
deepseek-reasoner - Chain-of-thought reasoning

Groq

{
  "llm": {
    "provider": "groq",
    "model": "moonshotai/kimi-k2-instruct-0905",
    "max_tokens": 8192,
    "api_key_env_var": "GROQ_API_KEY"
  }
}

Available Models:

moonshotai/kimi-k2-instruct-0905 - Best quality (recommended)
openai/gpt-oss-120b - Fast reasoning
llama3-70b - Llama 3 70B
mixtral-8x7b - Mixtral MoE

Cerebras

{
  "llm": {
    "provider": "cerebras",
    "model": "zai-glm-4.7",
    "max_tokens": 8192,
    "api_key_env_var": "CEREBRAS_API_KEY"
  }
}

Available Models:

zai-glm-4.7 - Best quality (recommended)
qwen3-235b-a22b-instruct-2507 - Frontier reasoning

ZAI Code

{
  "llm": {
    "provider": "openai",
    "model": "glm-5",
    "max_tokens": 32768,
    "api_key_env_var": "ZAI_API_KEY",
    "api_base": "https://api.z.ai/api/coding/paas/v4"
  }
}

Using in Code

Basic Usage

from framework.llm.litellm import LiteLLMProvider

# Initialize provider (reads from env var)
provider = LiteLLMProvider(model="claude-opus-4-6")

# Generate completion
response = provider.complete(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    max_tokens=1024
)

print(response.content)

With Custom API Key

provider = LiteLLMProvider(
    model="gpt-5.2",
    api_key="your-api-key-here"
)

With Custom API Base

# For proxies or local deployments
provider = LiteLLMProvider(
    model="gpt-4o-mini",
    api_base="https://my-proxy.com/v1"
)

Async Completion

import asyncio

async def main():
    provider = LiteLLMProvider(model="claude-opus-4-6")

    response = await provider.acomplete(
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=1024
    )

    print(response.content)

asyncio.run(main())

Streaming

import asyncio

async def main():
    provider = LiteLLMProvider(model="claude-opus-4-6")

    async for event in provider.stream(
        messages=[{"role": "user", "content": "Write a story"}],
        max_tokens=2048
    ):
        if event.type == "text_delta":
            print(event.content, end="", flush=True)

asyncio.run(main())

With Tools

from framework.llm.provider import Tool

tools = [
    Tool(
        name="web_search",
        description="Search the web",
        parameters={
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    )
]

response = provider.complete(
    messages=[{"role": "user", "content": "Search for quantum computing"}],
    tools=tools,
    max_tokens=1024
)

Model Selection Guide

By Use Case

Complex Reasoning

Best:

claude-opus-4-6 (Anthropic)
gpt-5.2 (OpenAI)
o1 (OpenAI - specialized reasoning)

Context: Up to 200K tokens with Claude

Fast Iteration

Best:

claude-haiku-4-5 (Anthropic)
gpt-5-mini (OpenAI)
gemini-3-flash (Google)
llama3-70b on Groq (ultra-fast)

Latency: < 1s with Groq, ~2s with others

Code Generation

Best:

deepseek-coder (DeepSeek)
claude-sonnet-4-5 (Anthropic)
gpt-4o (OpenAI)

Tools: All support function calling

Cost Optimization

Best:

gemini-3-flash (Free tier)
llama3-70b on Groq (Free tier)
gpt-5-mini (Cheap)

Free Tiers: Gemini, Groq, Cerebras

Long Context

Best:

claude-opus-4-6 (200K tokens)
gemini-1.5-pro (2M tokens)
gpt-4-turbo (128K tokens)

Note: Context costs scale linearly

By Budget

Budget	Model	Provider	Notes
Free	gemini-3-flash	Google	Free tier available
Free	llama3-70b	Groq	Fast, free tier
Low	gpt-5-mini	OpenAI	$0.10/1M tokens
Low	claude-haiku-4-5	Anthropic	$0.25/1M tokens
Medium	claude-sonnet-4-5	Anthropic	$3/1M tokens
Medium	gpt-4o	OpenAI	$5/1M tokens
High	claude-opus-4-6	Anthropic	$15/1M tokens
High	gpt-5.2	OpenAI	$20/1M tokens

Advanced Features

Rate Limit Handling

Automatic retry with exponential backoff:

response = provider.complete(
    messages=messages,
    max_tokens=1024,
    max_retries=5  # Override default (10)
)

Token Estimation

# Estimate tokens before sending
from framework.llm.litellm import _estimate_tokens

count, method = _estimate_tokens(
    model="claude-opus-4-6",
    messages=messages
)
print(f"Estimated tokens: {count} ({method})")

Failed Request Debugging

Failed requests are automatically dumped to:

~/.hive/failed_requests/
├── empty_response_claude-opus-4-6_20260303_120000_123456.json
├── rate_limit_gpt-4o_20260303_120100_234567.json
└── ...

Each dump includes:

Full request payload
Error type and attempt number
Token count estimate
Timestamp

Troubleshooting

API Key Not Found

Error: AuthenticationError: API key not foundSolution:

# Check if env var is set
echo $ANTHROPIC_API_KEY

# Set it
export ANTHROPIC_API_KEY="your-key"

# Or add to config
# ~/.hive/configuration.json: "api_key_env_var": "ANTHROPIC_API_KEY"

Rate Limit Exceeded

Error: RateLimitError: 429 Rate limit exceededSolution:

Framework retries automatically with backoff
Check server-provided retry-after header
Reduce concurrency
Upgrade to higher tier plan

Empty Response

Error: Empty content returnedCauses:

Rate limit (stealth 200 instead of 429)
Context window exceeded
finish_reason=length (max_tokens too low)

Solution:

Check ~/.hive/failed_requests/ for dumps
Increase max_tokens
Reduce context length

Context Window Exceeded

Error: BadRequestError: maximum context length exceededSolution:

Use model with larger context (e.g., claude-opus-4-6)
Implement message compaction
Summarize earlier conversation turns

Next Steps

Credential Management

Securely manage API keys

Self-Hosting

Deploy your own Hive instance

Get Started

Core Concepts

Building Agents

Runtime & Execution

Guides

​Overview

​Supported Providers

Anthropic

OpenAI

Google

DeepSeek

Groq

Cerebras

​Quick Setup via Quickstart

​Subscription Modes (No API Key Purchase)

​API Key Providers

​Manual Configuration

​Set Environment Variables

​Create Configuration File

​Provider-Specific Setup

​Anthropic (Claude)

​OpenAI

​Google Gemini

​DeepSeek

​Groq

​Cerebras

​ZAI Code

​Using in Code

​Basic Usage

​With Custom API Key

​With Custom API Base

​Async Completion

​Streaming

​With Tools

​Model Selection Guide

​By Use Case

​By Budget

​Advanced Features

​Rate Limit Handling

​Token Estimation

​Failed Request Debugging

​Troubleshooting

​Next Steps

Credential Management

Self-Hosting

Build docs developers (and LLMs) love

Overview

Supported Providers

Quick Setup via Quickstart

Subscription Modes (No API Key Purchase)

API Key Providers

Manual Configuration

Set Environment Variables

Create Configuration File

Provider-Specific Setup

Anthropic (Claude)

OpenAI

Google Gemini

DeepSeek

Groq

Cerebras

ZAI Code

Using in Code

Basic Usage

With Custom API Key

With Custom API Base

Async Completion

Streaming

With Tools

Model Selection Guide

By Use Case

By Budget

Advanced Features

Rate Limit Handling

Token Estimation

Failed Request Debugging

Troubleshooting

Next Steps