Skip to main content
PentAGI supports custom LLM providers through flexible YAML configuration files, enabling you to use any OpenAI-compatible API or customize model selection for different agent types.

Environment Variables

LLM_SERVER_URL
string
required
Base URL for the custom LLM API endpoint (e.g., https://api.openrouter.ai/api/v1).
LLM_SERVER_KEY
string
required
API key for the custom LLM provider.
LLM_SERVER_MODEL
string
Default model to use (can be overridden in provider config).
LLM_SERVER_CONFIG_PATH
string
Path to the YAML configuration file for agent-specific models.
LLM_SERVER_PROVIDER
string
Provider name prefix for model names (e.g., openrouter, deepseek for LiteLLM proxy).
LLM_SERVER_LEGACY_REASONING
boolean
default:"false"
Controls reasoning format in API requests. Set to true for legacy string-based reasoning_effort parameter.
LLM_SERVER_PRESERVE_REASONING
boolean
default:"false"
Preserve reasoning content in multi-turn conversations. Required by some providers (e.g., Moonshot).
PROXY_URL
string
Optional HTTP proxy URL for network isolation.

Configuration Examples

OpenRouter

# OpenRouter setup
LLM_SERVER_URL=https://openrouter.ai/api/v1
LLM_SERVER_KEY=sk-or-v1-...
LLM_SERVER_CONFIG_PATH=/path/to/openrouter.provider.yml

DeepSeek

# DeepSeek setup
LLM_SERVER_URL=https://api.deepseek.com
LLM_SERVER_KEY=sk-...
LLM_SERVER_CONFIG_PATH=/path/to/deepseek.provider.yml

DeepInfra

# DeepInfra setup
LLM_SERVER_URL=https://api.deepinfra.com/v1/openai
LLM_SERVER_KEY=...
LLM_SERVER_CONFIG_PATH=/path/to/deepinfra.provider.yml

Moonshot (with LiteLLM)

# Moonshot setup with reasoning preservation
LLM_SERVER_URL=https://api.moonshot.cn/v1
LLM_SERVER_KEY=sk-...
LLM_SERVER_CONFIG_PATH=/path/to/moonshot.provider.yml
LLM_SERVER_PRESERVE_REASONING=true

# When using through LiteLLM proxy
LLM_SERVER_PROVIDER=moonshot

Custom vLLM Server

# vLLM server setup
LLM_SERVER_URL=http://vllm-server:8000/v1
LLM_SERVER_KEY=token-abc123
LLM_SERVER_CONFIG_PATH=/path/to/vllm-qwen332b-fp16.provider.yml

YAML Configuration Structure

The provider configuration file uses YAML format with the following structure:
# Agent type: Configuration
agent_type:
  model: "model-name"
  temperature: 0.7
  top_p: 0.9
  n: 1
  max_tokens: 4000
  json: false
  reasoning:
    effort: medium
  price:
    input: 1.0
    output: 2.0

Supported Agent Types

Agent TypePurpose
simpleSimple queries and basic analysis
simple_jsonStructured data extraction (JSON output)
primary_agentCore penetration testing workflows
assistantMulti-step security workflows
generatorReport and exploit generation
refinerResult refinement and analysis
adviserStrategic recommendations
reflectorAnalysis review and critique
searcherInformation gathering
enricherData enrichment
coderExploit development
installerTool installation and setup
pentesterDedicated penetration testing

Configuration Parameters

model
string
required
Model identifier for the LLM provider.
temperature
float
default:"0.7"
Controls randomness (0.0-2.0). Lower values are more deterministic.
top_p
float
default:"0.9"
Nucleus sampling parameter (0.0-1.0). Controls diversity of output.
top_k
integer
Top-k sampling parameter. Limits vocabulary for each step.
n
integer
default:"1"
Number of completions to generate.
max_tokens
integer
required
Maximum number of tokens to generate.
json
boolean
default:"false"
Enable JSON output mode (for simple_json agent type).
reasoning
object
Reasoning configuration for models that support extended thinking.
  • effort: Reasoning effort level (low, medium, high)
  • max_tokens: Maximum tokens for reasoning (some providers)
price
object
Pricing information per million tokens.
  • input: Cost per million input tokens (USD)
  • output: Cost per million output tokens (USD)

Example Configurations

OpenAI-Compatible (Custom)

simple:
  model: "gpt-4.1-mini"
  temperature: 0.5
  top_p: 0.5
  n: 1
  max_tokens: 3000
  price:
    input: 0.4
    output: 1.6

primary_agent:
  model: "o3-mini"
  n: 1
  max_tokens: 4000
  reasoning:
    effort: low
  price:
    input: 1.1
    output: 4.4

coder:
  model: "gpt-4.1"
  temperature: 0.2
  top_p: 0.1
  n: 1
  max_tokens: 6000
  price:
    input: 2.0
    output: 8.0

DeepSeek

simple:
  model: "deepseek-chat"
  temperature: 0.6
  top_p: 0.95
  n: 1
  max_tokens: 4000
  price:
    input: 0.27
    output: 1.1

coder:
  model: "deepseek-coder"
  temperature: 0.7
  top_p: 1.0
  n: 1
  max_tokens: 8000
  price:
    input: 0.27
    output: 1.1

pentester:
  model: "deepseek-chat"
  temperature: 0.8
  top_p: 0.9
  n: 1
  max_tokens: 4000
  price:
    input: 0.27
    output: 1.1

Moonshot (with Reasoning)

simple:
  model: "kimi-k2-turbo-preview"
  temperature: 0.5
  n: 1
  max_tokens: 4096
  price:
    input: 1.15
    output: 8.0

primary_agent:
  model: "kimi-k2.5"
  temperature: 1.0
  n: 1
  max_tokens: 8192
  reasoning:
    effort: high
  price:
    input: 0.6
    output: 3.0

coder:
  model: "kimi-k2.5"
  temperature: 1.0
  n: 1
  max_tokens: 16384
  reasoning:
    effort: high
  price:
    input: 0.6
    output: 3.0

Ollama (Local)

simple:
  model: "llama3.1:8b"
  temperature: 0.2
  top_p: 0.3
  n: 1
  max_tokens: 4000

primary_agent:
  model: "llama3.1:8b"
  temperature: 0.2
  top_p: 0.3
  n: 1
  max_tokens: 4000

coder:
  model: "llama3.1:8b"
  temperature: 0.1
  top_p: 0.2
  n: 1
  max_tokens: 6000

pentester:
  model: "llama3.1:8b"
  temperature: 0.3
  top_p: 0.4
  n: 1
  max_tokens: 8000

LiteLLM Proxy Integration

The LLM_SERVER_PROVIDER setting is particularly useful when using LiteLLM proxy:
# LiteLLM adds provider prefix to model names
# Example: kimi-2.5 becomes moonshot/kimi-2.5

# Set provider prefix to use same config for both direct and proxy access
LLM_SERVER_PROVIDER=moonshot

# Now you can use the same YAML config file for:
# - Direct API: model: "kimi-2.5"
# - LiteLLM Proxy: model: "moonshot/kimi-2.5" (auto-prefixed)
This allows seamless switching between direct API access and LiteLLM proxy without modifying configuration files.

Reasoning Format Settings

Legacy Reasoning Format

Some providers use string-based reasoning effort:
LLM_SERVER_LEGACY_REASONING=true
Format:
{
  "reasoning_effort": "medium"
}

Modern Reasoning Format (Default)

LLM_SERVER_LEGACY_REASONING=false
Format:
{
  "reasoning": {
    "max_tokens": 4000
  }
}

Preserving Reasoning Content

Required by providers like Moonshot that return errors when reasoning content is missing:
LLM_SERVER_PRESERVE_REASONING=true
This ensures reasoning content is preserved and sent in subsequent API calls during multi-turn conversations.

Built-in Provider Configurations

PentAGI includes pre-built configurations in the Docker image at /opt/pentagi/conf/:
  • custom-openai.provider.yml - OpenAI-compatible providers
  • deepseek.provider.yml - DeepSeek API
  • deepinfra.provider.yml - DeepInfra platform
  • moonshot.provider.yml - Moonshot (Kimi) API
  • openrouter.provider.yml - OpenRouter aggregator
  • ollama-llama318b.provider.yml - Ollama with Llama 3.1 8B
  • ollama-llama318b-instruct.provider.yml - Ollama Llama 3.1 8B Instruct
  • ollama-qwen332b-fp16-tc.provider.yml - Ollama Qwen3 32B FP16
  • ollama-qwq32b-fp16-tc.provider.yml - Ollama QwQ 32B FP16
  • vllm-qwen332b-fp16.provider.yml - vLLM server with Qwen3
You can reference these directly:
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/deepseek.provider.yml

Creating Custom Configurations

  1. Create YAML file with agent configurations:
simple:
  model: "your-model-name"
  temperature: 0.5
  max_tokens: 3000

primary_agent:
  model: "your-advanced-model"
  temperature: 0.7
  max_tokens: 4000
  1. Mount configuration in docker-compose.yml:
volumes:
  - ./my-provider.yml:/opt/pentagi/conf/my-provider.yml:ro
  1. Configure environment variable:
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/my-provider.yml

Validation

PentAGI validates configuration files on startup:
  • Missing required fields generate warnings
  • Invalid values use safe defaults
  • Unknown agent types are ignored
  • Malformed YAML prevents startup
Check logs for validation messages:
docker compose logs pentagi | grep -i "provider config"

Troubleshooting

Configuration Not Loading

  1. Verify file path is correct and accessible
  2. Check YAML syntax: yamllint your-config.yml
  3. Ensure file is mounted in Docker container
  4. Review startup logs for parsing errors

Model Not Found

  1. Verify model name matches provider’s model ID
  2. Check API endpoint supports the model
  3. Ensure API key has access to the model
  4. Test with direct API call:
    curl -X POST https://api.provider.com/v1/chat/completions \
      -H "Authorization: Bearer $API_KEY" \
      -d '{"model": "model-name", "messages": [...]}'
    

Reasoning Errors

If you see reasoning-related errors:
  1. Try toggling LLM_SERVER_LEGACY_REASONING
  2. Enable LLM_SERVER_PRESERVE_REASONING for providers like Moonshot
  3. Check provider documentation for reasoning format
  4. Remove reasoning config if provider doesn’t support it

Pricing Issues

Pricing in config is informational only:
  • Used for cost estimation in PentAGI UI
  • Does not affect actual API billing
  • Update values to match current provider rates
  • Omit price section if not tracking costs

Best Practices

  1. Start with built-in configs - Use pre-built configurations as templates
  2. Test incrementally - Verify each agent type works before adding more
  3. Document changes - Add comments to YAML explaining customizations
  4. Version control - Track configuration changes in git
  5. Monitor costs - Keep pricing information updated for accurate estimates
  6. Use appropriate models - Match model capabilities to agent requirements
  7. Validate regularly - Test configurations after provider API updates

Build docs developers (and LLMs) love