LLM Provider Configuration

Overview

PentAGI supports multiple LLM providers to give you flexibility in choosing the best models for your penetration testing workflows. You must configure at least one provider to use PentAGI.

At least one LLM provider is required. Configure OpenAI, Anthropic, Gemini, AWS Bedrock, Ollama, or a custom provider before starting PentAGI.

OpenAI Configuration

OpenAI provides cutting-edge language models including GPT-4.1 series and o-series reasoning models.

OPEN_AI_KEY

string

required

Your OpenAI API key from platform.openai.com

OPEN_AI_KEY=sk-proj-...

OPEN_AI_SERVER_URL

string

default:"https://api.openai.com/v1"

OpenAI API endpoint URL

OPEN_AI_SERVER_URL=https://api.openai.com/v1

Supported Models

o-series: Advanced reasoning models (o1, o3, o4-mini) with step-by-step analytical thinking
GPT-4.1: Latest flagship models optimized for complex security research
GPT-4: Powerful models for deep analysis and exploit development
GPT-3.5: Fast, cost-effective models for high-volume scanning

Example Configuration

.env

# OpenAI Configuration
OPEN_AI_KEY=sk-proj-abc123...
OPEN_AI_SERVER_URL=https://api.openai.com/v1

# Optional: Use proxy
PROXY_URL=http://your-proxy:8080

Anthropic Configuration

Anthropic’s Claude models are known for exceptional reasoning capabilities and safety.

ANTHROPIC_API_KEY

string

required

Your Anthropic API key from console.anthropic.com

ANTHROPIC_API_KEY=sk-ant-...

ANTHROPIC_SERVER_URL

string

default:"https://api.anthropic.com/v1"

Anthropic API endpoint URL

ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1

Supported Models

Claude 4: Advanced reasoning for sophisticated penetration testing
Claude 3.7: Extended thinking capabilities for methodical security research
Claude 3.5 Haiku: High-speed performance for real-time monitoring
Claude Sonnet: Comprehensive analysis and threat hunting

Example Configuration

.env

# Anthropic Configuration
ANTHROPIC_API_KEY=sk-ant-api03-...
ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1

Google AI (Gemini) Configuration

Google’s Gemini models offer multimodal capabilities and large context windows.

GEMINI_API_KEY

string

required

Your Google AI API key from ai.google.dev

GEMINI_API_KEY=AIza...

GEMINI_SERVER_URL

string

default:"https://generativelanguage.googleapis.com"

Google AI API endpoint URL

GEMINI_SERVER_URL=https://generativelanguage.googleapis.com

Supported Models

Gemini 2.5: Advanced reasoning with step-by-step analysis
Gemini Pro: High-performance models for complex tasks
Gemini Flash: Cost-effective models for high-throughput operations
Extended Context: Up to 2M tokens for analyzing extensive codebases

Example Configuration

.env

# Google AI (Gemini) Configuration
GEMINI_API_KEY=AIzaSyD...
GEMINI_SERVER_URL=https://generativelanguage.googleapis.com

AWS Bedrock Configuration

Amazon Bedrock provides enterprise-grade access to foundation models from multiple providers.

BEDROCK_REGION

string

default:"us-east-1"

AWS region for Bedrock service

BEDROCK_REGION=us-east-1

BEDROCK_ACCESS_KEY_ID

string

required

AWS access key ID for authentication

BEDROCK_ACCESS_KEY_ID=AKIA...

BEDROCK_SECRET_ACCESS_KEY

string

required

AWS secret access key for authentication

BEDROCK_SECRET_ACCESS_KEY=wJalrXUtn...

BEDROCK_SESSION_TOKEN

string

AWS session token (for temporary credentials)

BEDROCK_SESSION_TOKEN=IQoJb3JpZ2lu...

BEDROCK_SERVER_URL

string

Optional custom Bedrock endpoint (for VPC endpoints)

BEDROCK_SERVER_URL=https://bedrock-runtime.us-east-1.amazonaws.com

Supported Models

Anthropic Claude: Claude 4 and Claude 3.7 with advanced reasoning
Amazon Nova: Multimodal models supporting text, image, and video
Meta Llama: Open-source models with various sizes
AI21 Jamba: High-performance enterprise models
Cohere Command: Optimized for conversational tasks
DeepSeek R1: Advanced reasoning capabilities

Rate Limits: AWS Bedrock has strict default rate limits:

us.anthropic.claude-sonnet-4: 2 requests/minute for new accounts
us.anthropic.claude-3-5-haiku: 20 requests/minute for new accounts

Request quota increases through AWS Service Quotas console for production use.

Converse API Required: PentAGI uses the Bedrock Converse API, which requires models to support:

✅ Converse and ConverseStream
✅ Tool use and streaming tool use

Verify model support at AWS Bedrock Documentation

Example Configuration

.env

# AWS Bedrock Configuration
BEDROCK_REGION=us-east-1
BEDROCK_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
BEDROCK_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Optional: Session token for temporary credentials
BEDROCK_SESSION_TOKEN=IQoJb3JpZ2luX2VjEA...

# Optional: Custom endpoint
BEDROCK_SERVER_URL=https://bedrock-runtime.us-east-1.amazonaws.com

Ollama Configuration

Ollama provides local LLM inference for zero-cost operation and enhanced privacy.

OLLAMA_SERVER_URL

string

required

URL of your Ollama server

OLLAMA_SERVER_URL=http://localhost:11434

OLLAMA_SERVER_MODEL

string

default:"llama3.1:8b-instruct-q8_0"

Default model for inference

OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0

OLLAMA_SERVER_CONFIG_PATH

string

Path to YAML configuration file for agent-specific models

OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama.provider.yml

OLLAMA_SERVER_PULL_MODELS_TIMEOUT

number

default:"600"

Timeout in seconds for downloading models

OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900

OLLAMA_SERVER_PULL_MODELS_ENABLED

boolean

default:"false"

Automatically download models on startup

OLLAMA_SERVER_PULL_MODELS_ENABLED=true

OLLAMA_SERVER_LOAD_MODELS_ENABLED

boolean

default:"false"

Query Ollama server for available models on startup

OLLAMA_SERVER_LOAD_MODELS_ENABLED=true

Creating Custom Models with Extended Context

PentAGI requires models with 110K context size. Create custom models using Modelfiles: Example: Qwen3 32B with Extended Context

Modelfile_qwen3_32b_fp16_tc

FROM qwen3:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1

ollama create qwen3:32b-fp16-tc -f Modelfile_qwen3_32b_fp16_tc

The num_ctx parameter can only be set during model creation via Modelfile - it cannot be changed after creation or overridden at runtime.

Example Configuration

.env

# Basic Ollama setup
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0

# Production setup with auto-pull and discovery
OLLAMA_SERVER_URL=http://ollama-server:11434
OLLAMA_SERVER_PULL_MODELS_ENABLED=true
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900
OLLAMA_SERVER_LOAD_MODELS_ENABLED=true

# Custom configuration with agent-specific models
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama.provider.yml

Performance Considerations

Model Discovery (LOAD_MODELS_ENABLED=true): Adds 1-2s startup latency
Auto-pull (PULL_MODELS_ENABLED=true): First startup may take several minutes
Static Config: Disable both flags and specify models in config file for fastest startup

Custom LLM Provider

Configure custom LLM providers including OpenRouter, DeepSeek, Moonshot, and Deep Infra.

LLM_SERVER_URL

string

required

Base URL for the custom LLM API endpoint

LLM_SERVER_URL=https://openrouter.ai/api/v1

LLM_SERVER_KEY

string

required

API key for the custom LLM provider

LLM_SERVER_KEY=sk-or-v1-...

LLM_SERVER_MODEL

string

Default model to use (can be overridden in config)

LLM_SERVER_MODEL=anthropic/claude-3-opus

LLM_SERVER_CONFIG_PATH

string

Path to YAML configuration file for agent-specific models

LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/custom.provider.yml

LLM_SERVER_PROVIDER

string

Provider name prefix for model names (useful for LiteLLM proxy)

LLM_SERVER_PROVIDER=openrouter

When using LiteLLM proxy, models get a provider prefix. Set this to use the same config file for both direct API access and proxy access.Example: moonshot/kimi-2.5 with LLM_SERVER_PROVIDER=moonshot

LLM_SERVER_LEGACY_REASONING

boolean

default:"false"

Use legacy string-based reasoning_effort parameter instead of structured reasoning object

LLM_SERVER_LEGACY_REASONING=true

LLM_SERVER_PRESERVE_REASONING

boolean

default:"false"

Preserve reasoning content in multi-turn conversations

LLM_SERVER_PRESERVE_REASONING=true

Required by some providers (e.g., Moonshot) that return errors when reasoning content is missing in multi-turn conversations.

Example Configurations

OpenRouter:

.env

LLM_SERVER_URL=https://openrouter.ai/api/v1
LLM_SERVER_KEY=sk-or-v1-...
LLM_SERVER_MODEL=anthropic/claude-3-opus
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/custom.provider.yml

DeepSeek:

.env

LLM_SERVER_URL=https://api.deepseek.com/v1
LLM_SERVER_KEY=sk-...
LLM_SERVER_MODEL=deepseek-chat
LLM_SERVER_LEGACY_REASONING=true

Moonshot (via LiteLLM):

.env

LLM_SERVER_URL=http://litellm-proxy:8000
LLM_SERVER_KEY=sk-...
LLM_SERVER_MODEL=kimi-2.5
LLM_SERVER_PROVIDER=moonshot
LLM_SERVER_PRESERVE_REASONING=true

Provider Configuration Files

Both Ollama and custom providers support YAML configuration files to specify different models for different agent types:

example.custom.provider.yml

models:
  default: "gpt-4o-mini"
  orchestrator: "gpt-4o"
  researcher: "gpt-4o"
  developer: "gpt-4o"
  infrastructure: "claude-3-5-sonnet-20241022"
  executor: "gpt-4o-mini"

example.ollama.provider.yml

models:
  default: "llama3.1:8b-instruct-q8_0"
  orchestrator: "llama3.1:70b-instruct-q8_0"
  researcher: "llama3.1:70b-instruct-q8_0"
  developer: "qwen3:32b-fp16-tc"
  infrastructure: "llama3.1:70b-instruct-q8_0"
  executor: "llama3.1:8b-instruct-q8_0"

Mount these files as volumes in docker-compose.yml:

volumes:
  - ./example.custom.provider.yml:/opt/pentagi/conf/custom.provider.yml
  - ./example.ollama.provider.yml:/opt/pentagi/conf/ollama.provider.yml

Global Proxy Configuration

All LLM providers support routing through a proxy for network isolation:

PROXY_URL

string

Global HTTP proxy URL for all LLM providers and external systems

PROXY_URL=http://proxy.example.com:8080

Get Started

Core Concepts

Configuration

Deployment

Features

Development

LLM Provider Configuration

Overview

OpenAI Configuration

Supported Models

Example Configuration

Anthropic Configuration

Supported Models

Example Configuration

Google AI (Gemini) Configuration

Supported Models

Example Configuration

AWS Bedrock Configuration

Supported Models

Example Configuration

Ollama Configuration

Creating Custom Models with Extended Context

Example Configuration

Performance Considerations

Custom LLM Provider

Example Configurations

Provider Configuration Files

Global Proxy Configuration

Next Steps

Configure Search Engines

Security Settings

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

​OpenAI Configuration

​Supported Models

​Example Configuration

​Anthropic Configuration

​Supported Models

​Example Configuration

​Google AI (Gemini) Configuration

​Supported Models

​Example Configuration

​AWS Bedrock Configuration

​Supported Models

​Example Configuration

​Ollama Configuration

​Creating Custom Models with Extended Context

​Example Configuration

​Performance Considerations

​Custom LLM Provider

​Example Configurations

​Provider Configuration Files

​Global Proxy Configuration

​Next Steps

Configure Search Engines

Security Settings

Build docs developers (and LLMs) love

Overview

OpenAI Configuration

Supported Models

Example Configuration

Anthropic Configuration

Supported Models

Example Configuration

Google AI (Gemini) Configuration

Supported Models

Example Configuration

AWS Bedrock Configuration

Supported Models

Example Configuration

Ollama Configuration

Creating Custom Models with Extended Context

Example Configuration

Performance Considerations

Custom LLM Provider

Example Configurations

Provider Configuration Files

Global Proxy Configuration

Next Steps