LLM Provider Setup

Overview

RAPTOR uses LiteLLM to provide unified access to multiple LLM providers. Configuration supports automatic model selection, fallback chains, and custom model routing for different task types.

Multi-Provider Support

Anthropic, OpenAI, Google Gemini, Mistral, and Ollama (local)

Automatic Model Selection

Best thinking model auto-selected from LiteLLM config

Fallback Chains

Automatic fallback to alternative models on failure

Task-Specific Routing

Route exploit generation to Opus, classification to Gemini

Quick Start

1. Install LiteLLM

pip install litellm

2. Set API Keys

# Anthropic Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI GPT
export OPENAI_API_KEY="sk-proj-..."

# Google Gemini
export GEMINI_API_KEY="AIza..."

# Mistral
export MISTRAL_API_KEY="..."

3. Run RAPTOR

# Auto-selects best available model
python raptor_agentic.py --repo /path/to/code

That’s it! RAPTOR automatically:

Detects available API keys
Selects the best thinking model
Configures fallback chains
Tracks costs

Provider Configuration

Anthropic Claude

Models:

claude-opus-4.5 - Most capable, best for exploit generation ($15/M tokens)
claude-sonnet-4.5 - Balanced performance ($3/M tokens)

Setup:

export ANTHROPIC_API_KEY="sk-ant-..."

Manual configuration:

from packages.llm_analysis.llm.config import ModelConfig, LLMConfig

claude_config = ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    max_tokens=64000,
    temperature=0.7,
    cost_per_1k_tokens=0.015,
)

config = LLMConfig(primary_model=claude_config)

Use Opus for deep security analysis and exploit generation. Use Sonnet for faster analysis at lower cost.

OpenAI GPT

Models:

gpt-5.2 - Latest GPT, strong reasoning ($5/M tokens)
gpt-5.2-thinking - Extended thinking mode ($6/M tokens)

Setup:

export OPENAI_API_KEY="sk-proj-..."

Manual configuration:

gpt_config = ModelConfig(
    provider="openai",
    model_name="gpt-5.2",
    api_key=os.getenv("OPENAI_API_KEY"),
    max_tokens=128000,
    temperature=0.7,
    cost_per_1k_tokens=0.005,
)

Google Gemini

Models:

gemini-3-pro - High capability ($0.10/M tokens)
gemini-3-deep-think - Reasoning mode ($0.20/M tokens)

Setup:

export GEMINI_API_KEY="AIza..."

Manual configuration:

gemini_config = ModelConfig(
    provider="gemini",
    model_name="gemini-3-pro",
    api_key=os.getenv("GEMINI_API_KEY"),
    max_tokens=8192,
    temperature=0.7,
    cost_per_1k_tokens=0.0001,
)

Important: Use LiteLLM aliases like gemini-3-pro, NOT underlying model IDs like gemini-3.0-pro-latest. LiteLLM handles version mapping.

Mistral

Models:

mistral-large-latest - Largest model ($2/M tokens)

Setup:

export MISTRAL_API_KEY="..."

Manual configuration:

mistral_config = ModelConfig(
    provider="mistral",
    model_name="mistral-large-latest",
    api_key=os.getenv("MISTRAL_API_KEY"),
    max_tokens=128000,
    temperature=0.7,
    cost_per_1k_tokens=0.002,
)

Ollama (Local Models)

Models: Any model you’ve pulled locally

llama3:70b - Meta’s Llama 3
mistral:latest - Mistral 7B
qwen2.5:72b - Alibaba’s Qwen
deepseek-coder:33b - DeepSeek Coder

Setup:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3:70b

# Start server (default: http://localhost:11434)
ollama serve

Manual configuration:

ollama_config = ModelConfig(
    provider="ollama",
    model_name="llama3:70b",
    api_base="http://localhost:11434",
    max_tokens=4096,
    temperature=0.7,
    cost_per_1k_tokens=0.0,  # FREE!
)

Local models are less reliable for exploit generation. RAPTOR warns:

⚠️  Local model - exploit PoCs may be unreliable
For production security research, consider cloud models.

LiteLLM Configuration File

Config File Location

RAPTOR searches for LiteLLM config in order:

$LITELLM_CONFIG_PATH environment variable
~/.config/litellm/config.yaml (XDG standard)
~/Documents/ClaudeCode/litellm/config.yaml (dev default)
/etc/litellm/config.yaml (system-wide, Linux/macOS only)

Config File Format

~/.config/litellm/config.yaml:

model_list:
  # Anthropic Claude
  - model_name: claude-opus-4.5
    litellm_params:
      model: anthropic/claude-opus-4.5
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      max_output_tokens: 64000
      supports_reasoning: true
  
  - model_name: claude-sonnet-4.5
    litellm_params:
      model: anthropic/claude-sonnet-4.5
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      max_output_tokens: 64000
  
  # OpenAI GPT
  - model_name: gpt-5.2
    litellm_params:
      model: openai/gpt-5.2
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      max_output_tokens: 128000
  
  - model_name: gpt-5.2-thinking
    litellm_params:
      model: openai/gpt-5.2-thinking
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      supports_reasoning: true
  
  # Google Gemini
  - model_name: gemini-3-pro
    litellm_params:
      model: gemini/gemini-3-pro
      api_key: os.environ/GEMINI_API_KEY
    model_info:
      max_output_tokens: 8192
  
  - model_name: gemini-3-deep-think
    litellm_params:
      model: gemini/gemini-3-deep-think
      api_key: os.environ/GEMINI_API_KEY
    model_info:
      supports_reasoning: true
  
  # Mistral
  - model_name: mistral-large
    litellm_params:
      model: mistral/mistral-large-latest
      api_key: os.environ/MISTRAL_API_KEY
    model_info:
      max_output_tokens: 128000

Automatic Model Selection

RAPTOR reads your LiteLLM config and auto-selects the best thinking model: Selection priority:

Models with supports_reasoning: true get +10 score boost
Opus models preferred over others
Latest versions preferred
Exact model matches preferred over aliases

Scoring system:

thinking_model_patterns = [
    # (underlying_model, alias, base_score)
    ("anthropic/claude-opus-4.5", "claude-opus-4.5", 110),  # Highest
    ("openai/gpt-5.2-thinking", "gpt-5.2-thinking", 95),   # +10 = 105 with reasoning flag
    ("gemini/gemini-3-deep-think", "gemini-3-deep-think", 90),  # +10 = 100 with reasoning
    ("anthropic/claude-opus-4", "claude-opus-4", 85),
    ("openai/gpt-5.2", "gpt-5.2", 80),
    ("anthropic/claude-sonnet-4.5", "claude-sonnet-4.5", 70),
]

Example:

from packages.llm_analysis.llm.config import LLMConfig

# Auto-selects best model from LiteLLM config
config = LLMConfig()
print(config.primary_model.provider)    # "anthropic"
print(config.primary_model.model_name)  # "claude-opus-4.5"

If you have multiple API keys configured, RAPTOR will automatically select the most capable model you have access to.

Fallback Configuration

Automatic Fallback Chains

RAPTOR builds fallback chains automatically based on available API keys:

config = LLMConfig(enable_fallback=True)

# If you have Claude, GPT, and Gemini keys:
# Primary: claude-opus-4.5
# Fallback 1: gpt-5.2
# Fallback 2: gemini-3-pro

response = client.generate(prompt)
# If Claude fails -> tries GPT
# If GPT fails -> tries Gemini
# If Gemini fails -> raises error

Same-Tier Fallback Rule

Fallback stays within same tier:

Cloud → Cloud: Anthropic fails → try OpenAI → try Gemini
Local → Local: llama3:70b fails → try mistral:latest → try qwen2.5:72b

NEVER cross tiers:

Cloud ❌ Local: Claude fails → does NOT fall back to Ollama
Local ❌ Cloud: Ollama fails → does NOT fall back to Claude

Reasoning: If cloud provider is down, fix infrastructure or wait. Don’t silently switch to lower-quality local model.

# If primary is Claude (cloud)
models_to_try = [
    claude_opus,      # Try first
    gpt_5,           # Cloud fallback
    gemini_pro,      # Cloud fallback
    # Ollama NOT included (different tier)
]

# If primary is Ollama (local)
models_to_try = [
    llama3,          # Try first
    mistral,         # Local fallback
    qwen,            # Local fallback
    # Claude NOT included (different tier)
]

Custom Fallback Chain

from packages.llm_analysis.llm.config import LLMConfig, ModelConfig

config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-opus-4.5",
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        cost_per_1k_tokens=0.015,
    ),
    fallback_models=[
        ModelConfig(
            provider="anthropic",
            model_name="claude-sonnet-4.5",  # Same provider, cheaper
            api_key=os.getenv("ANTHROPIC_API_KEY"),
            cost_per_1k_tokens=0.003,
        ),
        ModelConfig(
            provider="openai",
            model_name="gpt-5.2",  # Different provider
            api_key=os.getenv("OPENAI_API_KEY"),
            cost_per_1k_tokens=0.005,
        ),
    ],
    enable_fallback=True,
)

Disable Fallback

config = LLMConfig(enable_fallback=False)

# Will NOT fall back to other models on failure
response = client.generate(prompt)
# If primary fails -> immediately raises error

Task-Specific Model Routing

Specialized Models

Route different tasks to different models based on capability and cost:

config = LLMConfig(
    # Default: Balanced model for most tasks
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-sonnet-4.5",
        cost_per_1k_tokens=0.003,
    ),
    
    # Task-specific overrides
    specialized_models={
        # Deep analysis: Use most capable model
        "exploit_generation": ModelConfig(
            provider="anthropic",
            model_name="claude-opus-4.5",
            cost_per_1k_tokens=0.015,
        ),
        
        # Simple classification: Use cheapest model
        "vulnerability_classification": ModelConfig(
            provider="gemini",
            model_name="gemini-3-pro",
            cost_per_1k_tokens=0.0001,
        ),
        
        # Code generation: Use specialized model
        "patch_generation": ModelConfig(
            provider="openai",
            model_name="gpt-5.2",
            cost_per_1k_tokens=0.005,
        ),
    },
)

client = LLMClient(config)

# Uses claude-opus-4.5 (specialized)
response = client.generate(
    prompt="Generate exploit for buffer overflow...",
    task_type="exploit_generation",
)

# Uses gemini-3-pro (specialized)
response = client.generate(
    prompt="Is this exploitable? Yes/No",
    task_type="vulnerability_classification",
)

# Uses claude-sonnet-4.5 (primary - no override)
response = client.generate(
    prompt="Analyze this code for bugs...",
)

Task Type Reference

Task Type	Recommended Model	Reasoning
`exploit_generation`	Claude Opus	Needs deep reasoning and security expertise
`patch_generation`	GPT-5.2	Strong at code generation
`vulnerability_analysis`	Claude Sonnet	Balanced capability and cost
`code_review`	Claude Sonnet	Good at code understanding
`vulnerability_classification`	Gemini Pro	Simple yes/no, use cheapest
`ioc_extraction`	Gemini Pro	Pattern matching, use cheapest

Cost optimization strategy:

Use Opus for <20% of requests (critical reasoning)
Use Sonnet for ~60% of requests (balanced analysis)
Use Gemini for ~20% of requests (simple classification)

Typical scan cost breakdown: ~70% cheaper than Opus-only!

Retry and Rate Limiting

Retry Configuration

config = LLMConfig(
    max_retries=3,
    retry_delay=2.0,  # Local models
    retry_delay_remote=5.0,  # Cloud APIs
)

Exponential backoff:

for attempt in range(max_retries):
    try:
        return provider.generate(prompt)
    except Exception as e:
        if attempt < max_retries - 1:
            delay = retry_delay * (2 ** attempt)
            logger.debug(f"Retrying in {delay}s...")
            time.sleep(delay)

Retry delays:

Attempt	Local Delay	Remote Delay
1	2s	5s
2	4s	10s
3	8s	20s

Quota Detection

Automatic detection of quota/rate limit errors:

def _is_quota_error(error: Exception) -> bool:
    # Type-based detection (robust)
    if isinstance(error, litellm.RateLimitError):
        return True
    
    # String-based detection (fallback)
    error_str = str(error).lower()
    return any([
        "429" in error_str,
        "quota exceeded" in error_str,
        "rate limit" in error_str,
    ])

Behavior on quota error:

⚠️  Quota error for anthropic/claude-opus-4.5:
→ Anthropic rate limit exceeded
Provider message: Request rate limit exceeded. Please retry after 60 seconds.

► Falling back to: openai/gpt-5.2

Advanced Configuration

Temperature and Sampling

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    temperature=0.7,  # Default: balanced creativity/consistency
)

# More deterministic (exploit generation)
ModelConfig(temperature=0.3)

# More creative (vulnerability hypothesis)
ModelConfig(temperature=0.9)

Max Tokens

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    max_tokens=64000,  # Maximum output length
)

Model limits:

Provider	Model	Max Output Tokens
Anthropic	Claude Opus/Sonnet	64,000
OpenAI	GPT-5.2	128,000
Google	Gemini Pro	8,192
Ollama	Varies	2,048-8,192

Timeout

ModelConfig(
    provider="anthropic",
    model_name="claude-opus-4.5",
    timeout=120,  # Seconds (default: 2 minutes)
)

Long-running requests (exploit generation, deep analysis) may need higher timeout (300-600s).

Custom API Base

# For Ollama or self-hosted LLMs
ModelConfig(
    provider="ollama",
    model_name="llama3:70b",
    api_base="http://192.168.1.100:11434",  # Remote Ollama server
)

# For OpenAI-compatible APIs
ModelConfig(
    provider="openai",
    model_name="custom-model",
    api_base="https://api.example.com/v1",
)

Example Configurations

High-Capability, High-Cost

config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-opus-4.5",
        cost_per_1k_tokens=0.015,
    ),
    enable_fallback=False,  # Never fall back
    max_cost_per_scan=50.0,  # High budget
)

Use case: High-value targets, production exploits needed

Balanced (Recommended)

config = LLMConfig(
    primary_model=ModelConfig(
        provider="anthropic",
        model_name="claude-sonnet-4.5",
        cost_per_1k_tokens=0.003,
    ),
    specialized_models={
        "exploit_generation": ModelConfig(
            provider="anthropic",
            model_name="claude-opus-4.5",
            cost_per_1k_tokens=0.015,
        ),
    },
    enable_fallback=True,
    max_cost_per_scan=10.0,
)

Use case: Most security research, good quality at reasonable cost

Low-Cost

config = LLMConfig(
    primary_model=ModelConfig(
        provider="gemini",
        model_name="gemini-3-pro",
        cost_per_1k_tokens=0.0001,
    ),
    enable_fallback=True,
    max_cost_per_scan=1.0,
)

Use case: Exploratory scans, learning, testing

Zero-Cost (Local)

config = LLMConfig(
    primary_model=ModelConfig(
        provider="ollama",
        model_name="llama3:70b",
        api_base="http://localhost:11434",
        cost_per_1k_tokens=0.0,
    ),
    enable_fallback=True,  # Fall back to other local models
    enable_cost_tracking=False,  # No cost to track
)

Use case: Offline use, privacy-sensitive, unlimited requests

Troubleshooting

No API keys found

error

Error:

No cloud LLM API keys found (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY).
RAPTOR will use Ollama if available, or fail.

Fix: Set at least one API key:

export ANTHROPIC_API_KEY="sk-ant-..."

LiteLLM not installed

error

Error:

LiteLLM library not installed. Install with: pip install litellm

Fix:

pip install litellm

Model not found in LiteLLM config

warning

Symptom: Auto-selection falls back to manual API key detectionCause: LiteLLM config file not found or emptyFix: Create ~/.config/litellm/config.yaml (see Config File Format above)

Ollama connection failed

error

Error:

Could not connect to Ollama at http://localhost:11434

Fix:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve

Wrong model ID used

error

Symptom: LiteLLM error about unknown modelCause: Using underlying model ID instead of LiteLLM aliasWrong:

model_name="gemini-3.0-pro-latest"  # Underlying ID

Correct:

model_name="gemini-3-pro"  # LiteLLM alias

LiteLLM Documentation

Official LiteLLM documentation and model support

Provider Setup Guides

Detailed setup for each LLM provider

Cost Tracking

Budget enforcement and cost optimization strategies

Configuration Reference

Complete configuration options reference

Get Started

Core Concepts

Security Testing

Analysis & Exploitation

Advanced Features

Guides

​Overview

Multi-Provider Support

Automatic Model Selection

Fallback Chains

Task-Specific Routing

​Quick Start

​1. Install LiteLLM

​2. Set API Keys

​3. Run RAPTOR

​Provider Configuration

​Anthropic Claude

​OpenAI GPT

​Google Gemini

​Mistral

​Ollama (Local Models)

​LiteLLM Configuration File

​Config File Location

​Config File Format

​Automatic Model Selection

​Fallback Configuration

​Automatic Fallback Chains

​Same-Tier Fallback Rule

​Custom Fallback Chain

​Disable Fallback

​Task-Specific Model Routing

​Specialized Models

​Task Type Reference

​Retry and Rate Limiting

​Retry Configuration

​Quota Detection

​Advanced Configuration

​Temperature and Sampling

​Max Tokens

​Timeout

​Custom API Base

​Example Configurations

​High-Capability, High-Cost

​Balanced (Recommended)

​Low-Cost

​Zero-Cost (Local)

​Troubleshooting

​Further Reading

LiteLLM Documentation

Provider Setup Guides

Cost Tracking

Configuration Reference

Build docs developers (and LLMs) love

Overview

Quick Start

1. Install LiteLLM

2. Set API Keys

3. Run RAPTOR

Provider Configuration

Anthropic Claude

OpenAI GPT

Google Gemini

Mistral

Ollama (Local Models)

LiteLLM Configuration File

Config File Location

Config File Format

Automatic Model Selection

Fallback Configuration

Automatic Fallback Chains

Same-Tier Fallback Rule

Custom Fallback Chain

Disable Fallback

Task-Specific Model Routing

Specialized Models

Task Type Reference

Retry and Rate Limiting

Retry Configuration

Quota Detection

Advanced Configuration

Temperature and Sampling

Max Tokens

Timeout

Custom API Base

Example Configurations

High-Capability, High-Cost

Balanced (Recommended)

Low-Cost

Zero-Cost (Local)

Troubleshooting

Further Reading