Skip to main content

Overview

RAPTOR uses LiteLLM to provide unified access to multiple LLM providers. Configuration supports automatic model selection, fallback chains, and custom model routing for different task types.

Multi-Provider Support

Anthropic, OpenAI, Google Gemini, Mistral, and Ollama (local)

Automatic Model Selection

Best thinking model auto-selected from LiteLLM config

Fallback Chains

Automatic fallback to alternative models on failure

Task-Specific Routing

Route exploit generation to Opus, classification to Gemini

Quick Start

1. Install LiteLLM

pip install litellm

2. Set API Keys

# Anthropic Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI GPT
export OPENAI_API_KEY="sk-proj-..."

# Google Gemini
export GEMINI_API_KEY="AIza..."

# Mistral
export MISTRAL_API_KEY="..."

3. Run RAPTOR

# Auto-selects best available model
python raptor_agentic.py --repo /path/to/code
That’s it! RAPTOR automatically:
  • Detects available API keys
  • Selects the best thinking model
  • Configures fallback chains
  • Tracks costs

Provider Configuration

Anthropic Claude

Models:
  • claude-opus-4.5 - Most capable, best for exploit generation ($15/M tokens)
  • claude-sonnet-4.5 - Balanced performance ($3/M tokens)
Setup:
export ANTHROPIC_API_KEY="sk-ant-..."
Manual configuration:
from packages.llm_analysis.llm.config import ModelConfig, LLMConfig

claude_config = ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    max_tokens=64000,
    temperature=0.7,
    cost_per_1k_tokens=0.015,
)

config = LLMConfig(primary_model=claude_config)
Use Opus for deep security analysis and exploit generation. Use Sonnet for faster analysis at lower cost.

OpenAI GPT

Models:
  • gpt-5.2 - Latest GPT, strong reasoning ($5/M tokens)
  • gpt-5.2-thinking - Extended thinking mode ($6/M tokens)
Setup:
export OPENAI_API_KEY="sk-proj-..."
Manual configuration:
gpt_config = ModelConfig(
    provider="openai",
    model_name="gpt-5.2",
    api_key=os.getenv("OPENAI_API_KEY"),
    max_tokens=128000,
    temperature=0.7,
    cost_per_1k_tokens=0.005,
)

Google Gemini

Models:
  • gemini-3-pro - High capability ($0.10/M tokens)
  • gemini-3-deep-think - Reasoning mode ($0.20/M tokens)
Setup:
export GEMINI_API_KEY="AIza..."
Manual configuration:
gemini_config = ModelConfig(
    provider="gemini",
    model_name="gemini-3-pro",
    api_key=os.getenv("GEMINI_API_KEY"),
    max_tokens=8192,
    temperature=0.7,
    cost_per_1k_tokens=0.0001,
)
Important: Use LiteLLM aliases like gemini-3-pro, NOT underlying model IDs like gemini-3.0-pro-latest. LiteLLM handles version mapping.

Mistral

Models:
  • mistral-large-latest - Largest model ($2/M tokens)
Setup:
export MISTRAL_API_KEY="..."
Manual configuration:
mistral_config = ModelConfig(
    provider="mistral",
    model_name="mistral-large-latest",
    api_key=os.getenv("MISTRAL_API_KEY"),
    max_tokens=128000,
    temperature=0.7,
    cost_per_1k_tokens=0.002,
)

Ollama (Local Models)

Models: Any model you’ve pulled locally
  • llama3:70b - Meta’s Llama 3
  • mistral:latest - Mistral 7B
  • qwen2.5:72b - Alibaba’s Qwen
  • deepseek-coder:33b - DeepSeek Coder
Setup:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3:70b

# Start server (default: http://localhost:11434)
ollama serve
Manual configuration:
ollama_config = ModelConfig(
    provider="ollama",
    model_name="llama3:70b",
    api_base="http://localhost:11434",
    max_tokens=4096,
    temperature=0.7,
    cost_per_1k_tokens=0.0,  # FREE!
)
Local models are less reliable for exploit generation. RAPTOR warns:
⚠️  Local model - exploit PoCs may be unreliable
For production security research, consider cloud models.

LiteLLM Configuration File

Config File Location

RAPTOR searches for LiteLLM config in order:
  1. $LITELLM_CONFIG_PATH environment variable
  2. ~/.config/litellm/config.yaml (XDG standard)
  3. ~/Documents/ClaudeCode/litellm/config.yaml (dev default)
  4. /etc/litellm/config.yaml (system-wide, Linux/macOS only)

Config File Format

~/.config/litellm/config.yaml:
model_list:
  # Anthropic Claude
  - model_name: claude-opus-4.5
    litellm_params:
      model: anthropic/claude-opus-4.5
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      max_output_tokens: 64000
      supports_reasoning: true
  
  - model_name: claude-sonnet-4.5
    litellm_params:
      model: anthropic/claude-sonnet-4.5
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      max_output_tokens: 64000
  
  # OpenAI GPT
  - model_name: gpt-5.2
    litellm_params:
      model: openai/gpt-5.2
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      max_output_tokens: 128000
  
  - model_name: gpt-5.2-thinking
    litellm_params:
      model: openai/gpt-5.2-thinking
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      supports_reasoning: true
  
  # Google Gemini
  - model_name: gemini-3-pro
    litellm_params:
      model: gemini/gemini-3-pro
      api_key: os.environ/GEMINI_API_KEY
    model_info:
      max_output_tokens: 8192
  
  - model_name: gemini-3-deep-think
    litellm_params:
      model: gemini/gemini-3-deep-think
      api_key: os.environ/GEMINI_API_KEY
    model_info:
      supports_reasoning: true
  
  # Mistral
  - model_name: mistral-large
    litellm_params:
      model: mistral/mistral-large-latest
      api_key: os.environ/MISTRAL_API_KEY
    model_info:
      max_output_tokens: 128000

Automatic Model Selection

RAPTOR reads your LiteLLM config and auto-selects the best thinking model: Selection priority:
  1. Models with supports_reasoning: true get +10 score boost
  2. Opus models preferred over others
  3. Latest versions preferred
  4. Exact model matches preferred over aliases
Scoring system:
thinking_model_patterns = [
    # (underlying_model, alias, base_score)
    ("anthropic/claude-opus-4.5", "claude-opus-4.5", 110),  # Highest
    ("openai/gpt-5.2-thinking", "gpt-5.2-thinking", 95),   # +10 = 105 with reasoning flag
    ("gemini/gemini-3-deep-think", "gemini-3-deep-think", 90),  # +10 = 100 with reasoning
    ("anthropic/claude-opus-4", "claude-opus-4", 85),
    ("openai/gpt-5.2", "gpt-5.2", 80),
    ("anthropic/claude-sonnet-4.5", "claude-sonnet-4.5", 70),
]
Example:
from packages.llm_analysis.llm.config import LLMConfig

# Auto-selects best model from LiteLLM config
config = LLMConfig()
print(config.primary_model.provider)    # "anthropic"
print(config.primary_model.model_name)  # "claude-opus-4.5"
If you have multiple API keys configured, RAPTOR will automatically select the most capable model you have access to.

Fallback Configuration

Automatic Fallback Chains

RAPTOR builds fallback chains automatically based on available API keys:
config = LLMConfig(enable_fallback=True)

# If you have Claude, GPT, and Gemini keys:
# Primary: claude-opus-4.5
# Fallback 1: gpt-5.2
# Fallback 2: gemini-3-pro

response = client.generate(prompt)
# If Claude fails -> tries GPT
# If GPT fails -> tries Gemini
# If Gemini fails -> raises error

Same-Tier Fallback Rule

Fallback stays within same tier:
  • Cloud → Cloud: Anthropic fails → try OpenAI → try Gemini
  • Local → Local: llama3:70b fails → try mistral:latest → try qwen2.5:72b
NEVER cross tiers:
  • Cloud ❌ Local: Claude fails → does NOT fall back to Ollama
  • Local ❌ Cloud: Ollama fails → does NOT fall back to Claude
Reasoning: If cloud provider is down, fix infrastructure or wait. Don’t silently switch to lower-quality local model.
# If primary is Claude (cloud)
models_to_try = [
    claude_opus,      # Try first
    gpt_5,           # Cloud fallback
    gemini_pro,      # Cloud fallback
    # Ollama NOT included (different tier)
]

# If primary is Ollama (local)
models_to_try = [
    llama3,          # Try first
    mistral,         # Local fallback
    qwen,            # Local fallback
    # Claude NOT included (different tier)
]

Custom Fallback Chain

from packages.llm_analysis.llm.config import LLMConfig, ModelConfig

config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-opus-4.5",
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        cost_per_1k_tokens=0.015,
    ),
    fallback_models=[
        ModelConfig(
            provider="anthropic",
            model_name="claude-sonnet-4.5",  # Same provider, cheaper
            api_key=os.getenv("ANTHROPIC_API_KEY"),
            cost_per_1k_tokens=0.003,
        ),
        ModelConfig(
            provider="openai",
            model_name="gpt-5.2",  # Different provider
            api_key=os.getenv("OPENAI_API_KEY"),
            cost_per_1k_tokens=0.005,
        ),
    ],
    enable_fallback=True,
)

Disable Fallback

config = LLMConfig(enable_fallback=False)

# Will NOT fall back to other models on failure
response = client.generate(prompt)
# If primary fails -> immediately raises error

Task-Specific Model Routing

Specialized Models

Route different tasks to different models based on capability and cost:
config = LLMConfig(
    # Default: Balanced model for most tasks
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-sonnet-4.5",
        cost_per_1k_tokens=0.003,
    ),
    
    # Task-specific overrides
    specialized_models={
        # Deep analysis: Use most capable model
        "exploit_generation": ModelConfig(
            provider="anthropic",
            model_name="claude-opus-4.5",
            cost_per_1k_tokens=0.015,
        ),
        
        # Simple classification: Use cheapest model
        "vulnerability_classification": ModelConfig(
            provider="gemini",
            model_name="gemini-3-pro",
            cost_per_1k_tokens=0.0001,
        ),
        
        # Code generation: Use specialized model
        "patch_generation": ModelConfig(
            provider="openai",
            model_name="gpt-5.2",
            cost_per_1k_tokens=0.005,
        ),
    },
)

client = LLMClient(config)

# Uses claude-opus-4.5 (specialized)
response = client.generate(
    prompt="Generate exploit for buffer overflow...",
    task_type="exploit_generation",
)

# Uses gemini-3-pro (specialized)
response = client.generate(
    prompt="Is this exploitable? Yes/No",
    task_type="vulnerability_classification",
)

# Uses claude-sonnet-4.5 (primary - no override)
response = client.generate(
    prompt="Analyze this code for bugs...",
)

Task Type Reference

Task TypeRecommended ModelReasoning
exploit_generationClaude OpusNeeds deep reasoning and security expertise
patch_generationGPT-5.2Strong at code generation
vulnerability_analysisClaude SonnetBalanced capability and cost
code_reviewClaude SonnetGood at code understanding
vulnerability_classificationGemini ProSimple yes/no, use cheapest
ioc_extractionGemini ProPattern matching, use cheapest
Cost optimization strategy:
  • Use Opus for <20% of requests (critical reasoning)
  • Use Sonnet for ~60% of requests (balanced analysis)
  • Use Gemini for ~20% of requests (simple classification)
Typical scan cost breakdown: ~70% cheaper than Opus-only!

Retry and Rate Limiting

Retry Configuration

config = LLMConfig(
    max_retries=3,
    retry_delay=2.0,  # Local models
    retry_delay_remote=5.0,  # Cloud APIs
)
Exponential backoff:
for attempt in range(max_retries):
    try:
        return provider.generate(prompt)
    except Exception as e:
        if attempt < max_retries - 1:
            delay = retry_delay * (2 ** attempt)
            logger.debug(f"Retrying in {delay}s...")
            time.sleep(delay)
Retry delays:
AttemptLocal DelayRemote Delay
12s5s
24s10s
38s20s

Quota Detection

Automatic detection of quota/rate limit errors:
def _is_quota_error(error: Exception) -> bool:
    # Type-based detection (robust)
    if isinstance(error, litellm.RateLimitError):
        return True
    
    # String-based detection (fallback)
    error_str = str(error).lower()
    return any([
        "429" in error_str,
        "quota exceeded" in error_str,
        "rate limit" in error_str,
    ])
Behavior on quota error:
⚠️  Quota error for anthropic/claude-opus-4.5:
→ Anthropic rate limit exceeded
Provider message: Request rate limit exceeded. Please retry after 60 seconds.

► Falling back to: openai/gpt-5.2

Advanced Configuration

Temperature and Sampling

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    temperature=0.7,  # Default: balanced creativity/consistency
)

# More deterministic (exploit generation)
ModelConfig(temperature=0.3)

# More creative (vulnerability hypothesis)
ModelConfig(temperature=0.9)

Max Tokens

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    max_tokens=64000,  # Maximum output length
)
Model limits:
ProviderModelMax Output Tokens
AnthropicClaude Opus/Sonnet64,000
OpenAIGPT-5.2128,000
GoogleGemini Pro8,192
OllamaVaries2,048-8,192

Timeout

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    timeout=120,  # Seconds (default: 2 minutes)
)
Long-running requests (exploit generation, deep analysis) may need higher timeout (300-600s).

Custom API Base

# For Ollama or self-hosted LLMs
ModelConfig(
    provider="ollama",
    model_name="llama3:70b",
    api_base="http://192.168.1.100:11434",  # Remote Ollama server
)

# For OpenAI-compatible APIs
ModelConfig(
    provider="openai",
    model_name="custom-model",
    api_base="https://api.example.com/v1",
)

Example Configurations

High-Capability, High-Cost

config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-opus-4.5",
        cost_per_1k_tokens=0.015,
    ),
    enable_fallback=False,  # Never fall back
    max_cost_per_scan=50.0,  # High budget
)
Use case: High-value targets, production exploits needed
config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-sonnet-4.5",
        cost_per_1k_tokens=0.003,
    ),
    specialized_models={
        "exploit_generation": ModelConfig(
            provider="anthropic",
            model_name="claude-opus-4.5",
            cost_per_1k_tokens=0.015,
        ),
    },
    enable_fallback=True,
    max_cost_per_scan=10.0,
)
Use case: Most security research, good quality at reasonable cost

Low-Cost

config = LLMConfig(
    primary_model=ModelConfig(
        provider="gemini",
        model_name="gemini-3-pro",
        cost_per_1k_tokens=0.0001,
    ),
    enable_fallback=True,
    max_cost_per_scan=1.0,
)
Use case: Exploratory scans, learning, testing

Zero-Cost (Local)

config = LLMConfig(
    primary_model=ModelConfig(
        provider="ollama",
        model_name="llama3:70b",
        api_base="http://localhost:11434",
        cost_per_1k_tokens=0.0,
    ),
    enable_fallback=True,  # Fall back to other local models
    enable_cost_tracking=False,  # No cost to track
)
Use case: Offline use, privacy-sensitive, unlimited requests

Troubleshooting

No API keys found
error
Error:
No cloud LLM API keys found (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY).
RAPTOR will use Ollama if available, or fail.
Fix: Set at least one API key:
export ANTHROPIC_API_KEY="sk-ant-..."
LiteLLM not installed
error
Error:
LiteLLM library not installed. Install with: pip install litellm
Fix:
pip install litellm
Model not found in LiteLLM config
warning
Symptom: Auto-selection falls back to manual API key detectionCause: LiteLLM config file not found or emptyFix: Create ~/.config/litellm/config.yaml (see Config File Format above)
Ollama connection failed
error
Error:
Could not connect to Ollama at http://localhost:11434
Fix:
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve
Wrong model ID used
error
Symptom: LiteLLM error about unknown modelCause: Using underlying model ID instead of LiteLLM aliasWrong:
model_name="gemini-3.0-pro-latest"  # Underlying ID
Correct:
model_name="gemini-3-pro"  # LiteLLM alias

Further Reading

LiteLLM Documentation

Official LiteLLM documentation and model support

Provider Setup Guides

Detailed setup for each LLM provider

Cost Tracking

Budget enforcement and cost optimization strategies

Configuration Reference

Complete configuration options reference

Build docs developers (and LLMs) love