Skip to main content

Overview

The Hive framework uses LiteLLM to provide unified access to multiple LLM providers through a single interface. This allows you to switch between providers seamlessly without changing your agent code.

Supported Providers

Anthropic

Claude Opus, Sonnet, Haiku models with extended context

OpenAI

GPT-4o, GPT-4 Turbo, GPT-3.5, o1 reasoning models

Google

Gemini Pro, Gemini Flash with multimodal support

DeepSeek

DeepSeek Chat, Coder, Reasoner models

Groq

Ultra-fast inference with Llama, Mixtral models

Cerebras

Fast inference with GLM and Qwen models

Quick Setup via Quickstart

The interactive quickstart script guides you through provider configuration:
bash quickstart.sh
You’ll be prompted to choose from:

Subscription Modes (No API Key Purchase)

1

Claude Code Subscription

Use your Claude Max/Pro plan for API access.Setup: Run claude CLI to authenticate, then select option 1 in quickstart.Models: claude-opus-4-6, claude-sonnet-4-5-20250929
2

ZAI Code Subscription

Use your ZAI Code plan for API access.Setup: Provide ZAI API key when prompted.Models: glm-5 (32K context)
3

OpenAI Codex Subscription

Use your ChatGPT Plus plan for API access.Setup: Authenticate via OAuth when prompted.Models: gpt-5.3-codex

API Key Providers

1

Anthropic (Recommended)

Get API key: https://console.anthropic.com/settings/keysModels: claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5
2

OpenAI

Get API key: https://platform.openai.com/api-keysModels: gpt-5.2, gpt-5-mini, gpt-4o, gpt-4-turbo
3

Google Gemini (Free Tier)

Get API key: https://aistudio.google.com/apikeyModels: gemini-3-flash-preview, gemini-3.1-pro-preview
4

Groq (Fast, Free Tier)

Get API key: https://console.groq.com/keysModels: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-120b
5

Cerebras (Fast, Free Tier)

Get API key: https://cloud.cerebras.ai/Models: zai-glm-4.7, qwen3-235b-a22b-instruct-2507

Manual Configuration

Set Environment Variables

Add your API key to your shell configuration:
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Google Gemini
export GEMINI_API_KEY="AI..."

# Groq
export GROQ_API_KEY="gsk_..."

# Cerebras
export CEREBRAS_API_KEY="csk-..."

# DeepSeek
export DEEPSEEK_API_KEY="sk-..."
Add to ~/.bashrc or ~/.zshrc for persistence:
echo 'export ANTHROPIC_API_KEY="your-key"' >> ~/.bashrc
source ~/.bashrc

Create Configuration File

Create ~/.hive/configuration.json:
{
  "llm": {
    "provider": "anthropic",
    "model": "claude-opus-4-6",
    "max_tokens": 32768,
    "api_key_env_var": "ANTHROPIC_API_KEY"
  },
  "created_at": "2026-03-03T00:00:00+00:00"
}

Provider-Specific Setup

Anthropic (Claude)

{
  "llm": {
    "provider": "anthropic",
    "model": "claude-opus-4-6",
    "max_tokens": 32768,
    "api_key_env_var": "ANTHROPIC_API_KEY"
  }
}
Available Models:
  • claude-opus-4-6 - Most capable (recommended)
  • claude-sonnet-4-5-20250929 - Best balance
  • claude-sonnet-4-20250514 - Fast + capable
  • claude-haiku-4-5-20251001 - Fast + cheap

OpenAI

{
  "llm": {
    "provider": "openai",
    "model": "gpt-5.2",
    "max_tokens": 16384,
    "api_key_env_var": "OPENAI_API_KEY"
  }
}
Available Models:
  • gpt-5.2 - Most capable (recommended)
  • gpt-5-mini - Fast + cheap
  • gpt-4o - Multimodal flagship
  • gpt-4-turbo - Fast GPT-4
  • o1 - Reasoning model

Google Gemini

{
  "llm": {
    "provider": "gemini",
    "model": "gemini-3-flash-preview",
    "max_tokens": 8192,
    "api_key_env_var": "GEMINI_API_KEY"
  }
}
Available Models:
  • gemini-3-flash-preview - Fast (recommended)
  • gemini-3.1-pro-preview - Best quality
  • gemini-1.5-pro - Extended context (2M tokens)

DeepSeek

{
  "llm": {
    "provider": "deepseek",
    "model": "deepseek-chat",
    "max_tokens": 8192,
    "api_key_env_var": "DEEPSEEK_API_KEY"
  }
}
Available Models:
  • deepseek-chat - General purpose
  • deepseek-coder - Code generation
  • deepseek-reasoner - Chain-of-thought reasoning

Groq

{
  "llm": {
    "provider": "groq",
    "model": "moonshotai/kimi-k2-instruct-0905",
    "max_tokens": 8192,
    "api_key_env_var": "GROQ_API_KEY"
  }
}
Available Models:
  • moonshotai/kimi-k2-instruct-0905 - Best quality (recommended)
  • openai/gpt-oss-120b - Fast reasoning
  • llama3-70b - Llama 3 70B
  • mixtral-8x7b - Mixtral MoE

Cerebras

{
  "llm": {
    "provider": "cerebras",
    "model": "zai-glm-4.7",
    "max_tokens": 8192,
    "api_key_env_var": "CEREBRAS_API_KEY"
  }
}
Available Models:
  • zai-glm-4.7 - Best quality (recommended)
  • qwen3-235b-a22b-instruct-2507 - Frontier reasoning

ZAI Code

{
  "llm": {
    "provider": "openai",
    "model": "glm-5",
    "max_tokens": 32768,
    "api_key_env_var": "ZAI_API_KEY",
    "api_base": "https://api.z.ai/api/coding/paas/v4"
  }
}

Using in Code

Basic Usage

from framework.llm.litellm import LiteLLMProvider

# Initialize provider (reads from env var)
provider = LiteLLMProvider(model="claude-opus-4-6")

# Generate completion
response = provider.complete(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    max_tokens=1024
)

print(response.content)

With Custom API Key

provider = LiteLLMProvider(
    model="gpt-5.2",
    api_key="your-api-key-here"
)

With Custom API Base

# For proxies or local deployments
provider = LiteLLMProvider(
    model="gpt-4o-mini",
    api_base="https://my-proxy.com/v1"
)

Async Completion

import asyncio

async def main():
    provider = LiteLLMProvider(model="claude-opus-4-6")

    response = await provider.acomplete(
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=1024
    )

    print(response.content)

asyncio.run(main())

Streaming

import asyncio

async def main():
    provider = LiteLLMProvider(model="claude-opus-4-6")

    async for event in provider.stream(
        messages=[{"role": "user", "content": "Write a story"}],
        max_tokens=2048
    ):
        if event.type == "text_delta":
            print(event.content, end="", flush=True)

asyncio.run(main())

With Tools

from framework.llm.provider import Tool

tools = [
    Tool(
        name="web_search",
        description="Search the web",
        parameters={
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    )
]

response = provider.complete(
    messages=[{"role": "user", "content": "Search for quantum computing"}],
    tools=tools,
    max_tokens=1024
)

Model Selection Guide

By Use Case

Best:
  • claude-opus-4-6 (Anthropic)
  • gpt-5.2 (OpenAI)
  • o1 (OpenAI - specialized reasoning)
Context: Up to 200K tokens with Claude
Best:
  • claude-haiku-4-5 (Anthropic)
  • gpt-5-mini (OpenAI)
  • gemini-3-flash (Google)
  • llama3-70b on Groq (ultra-fast)
Latency: < 1s with Groq, ~2s with others
Best:
  • deepseek-coder (DeepSeek)
  • claude-sonnet-4-5 (Anthropic)
  • gpt-4o (OpenAI)
Tools: All support function calling
Best:
  • gemini-3-flash (Free tier)
  • llama3-70b on Groq (Free tier)
  • gpt-5-mini (Cheap)
Free Tiers: Gemini, Groq, Cerebras
Best:
  • claude-opus-4-6 (200K tokens)
  • gemini-1.5-pro (2M tokens)
  • gpt-4-turbo (128K tokens)
Note: Context costs scale linearly

By Budget

BudgetModelProviderNotes
Freegemini-3-flashGoogleFree tier available
Freellama3-70bGroqFast, free tier
Lowgpt-5-miniOpenAI$0.10/1M tokens
Lowclaude-haiku-4-5Anthropic$0.25/1M tokens
Mediumclaude-sonnet-4-5Anthropic$3/1M tokens
Mediumgpt-4oOpenAI$5/1M tokens
Highclaude-opus-4-6Anthropic$15/1M tokens
Highgpt-5.2OpenAI$20/1M tokens

Advanced Features

Rate Limit Handling

Automatic retry with exponential backoff:
response = provider.complete(
    messages=messages,
    max_tokens=1024,
    max_retries=5  # Override default (10)
)

Token Estimation

# Estimate tokens before sending
from framework.llm.litellm import _estimate_tokens

count, method = _estimate_tokens(
    model="claude-opus-4-6",
    messages=messages
)
print(f"Estimated tokens: {count} ({method})")

Failed Request Debugging

Failed requests are automatically dumped to:
~/.hive/failed_requests/
├── empty_response_claude-opus-4-6_20260303_120000_123456.json
├── rate_limit_gpt-4o_20260303_120100_234567.json
└── ...
Each dump includes:
  • Full request payload
  • Error type and attempt number
  • Token count estimate
  • Timestamp

Troubleshooting

Error: AuthenticationError: API key not foundSolution:
# Check if env var is set
echo $ANTHROPIC_API_KEY

# Set it
export ANTHROPIC_API_KEY="your-key"

# Or add to config
# ~/.hive/configuration.json: "api_key_env_var": "ANTHROPIC_API_KEY"
Error: RateLimitError: 429 Rate limit exceededSolution:
  • Framework retries automatically with backoff
  • Check server-provided retry-after header
  • Reduce concurrency
  • Upgrade to higher tier plan
Error: Empty content returnedCauses:
  • Rate limit (stealth 200 instead of 429)
  • Context window exceeded
  • finish_reason=length (max_tokens too low)
Solution:
  • Check ~/.hive/failed_requests/ for dumps
  • Increase max_tokens
  • Reduce context length
Error: BadRequestError: maximum context length exceededSolution:
  • Use model with larger context (e.g., claude-opus-4-6)
  • Implement message compaction
  • Summarize earlier conversation turns

Next Steps

Credential Management

Securely manage API keys

Self-Hosting

Deploy your own Hive instance

Build docs developers (and LLMs) love