Model Selection - Pensar Apex

Pensar Apex supports multiple AI models from different providers. Model selection impacts testing quality, speed, and cost.

Default Model

The default model is claude-sonnet-4-5 (Anthropic Claude 4.5 Sonnet), which provides:

Excellent reasoning for security analysis
Tool use capability for pentest actions
Large context window (200k tokens)
Good balance of performance and cost

For most pentesting scenarios, the default model provides excellent results. Only change if you have specific requirements.

Selecting a Model

Via Command Line

Specify the model with the --model flag:

pensar pentest --target https://example.com --model claude-opus-4

Via TUI

Launch the TUI: pensar
Navigate to Models screen
Select from available models or enter a custom model name
The selection persists in ~/.pensar/config.json

Programmatic API

Set the model in your code:

import { runPentestAgent } from '@pensar/apex';

const result = await runPentestAgent({
  target: 'https://example.com',
  model: 'claude-sonnet-4-5',
  // ... other options
});

Recommended Models by Provider

Anthropic (Recommended)
OpenAI
AWS Bedrock
OpenRouter
Local (vLLM)

Claude 4.5 Sonnet (Default)

--model claude-sonnet-4-5

Best for: Most pentesting scenarios
Context: 200k tokens
Strengths: Excellent reasoning, tool use, security analysis

Claude 4 Opus

--model claude-opus-4

Best for: Complex targets, advanced exploitation
Context: 200k tokens
Strengths: Maximum reasoning capability, handles complex chains

Claude 3.5 Sonnet

--model claude-sonnet-3-5

Best for: Budget-conscious testing
Context: 200k tokens
Strengths: Previous generation, still highly capable

GPT-4

--model gpt-4

Best for: General testing when Anthropic unavailable
Context: 128k tokens
Strengths: Good reasoning, wide knowledge

GPT-4 Turbo

--model gpt-4-turbo

Best for: Faster, cost-effective alternative
Context: 128k tokens
Strengths: Lower latency than GPT-4

GPT models may not perform as well as Claude for security testing. Use Anthropic when possible.

Claude on Bedrock

--model anthropic.claude-sonnet-4-5

Best for: Enterprise AWS deployments
Strengths: Same Claude quality with AWS security/compliance

Llama 3.1 70B

--model meta.llama3-70b

Best for: Open-source preference
Strengths: Good reasoning, no vendor lock-in

Model IDs on Bedrock use provider prefixes (e.g., anthropic., meta.). Check AWS docs for your region.

Anthropic via OpenRouter

--model anthropic/claude-sonnet-4-5

Best for: Single API for multiple providers
Strengths: Access to latest models across providers

Other Models

openai/gpt-4
google/gemini-pro
meta-llama/llama-3-70b

See openrouter.ai/models for complete list.

Llama 3.1 70B Instruct

export LOCAL_MODEL_URL="http://localhost:8000/v1"
pensar pentest --target https://example.com --model meta-llama/Llama-3.1-70B-Instruct

Best for: Complete data privacy, offline scenarios
Strengths: No data leaves your infrastructure

DeepSeek Coder

--model deepseek-ai/deepseek-coder-33b-instruct

Best for: Code-heavy whitebox testing
Strengths: Specialized in code understanding

Local models may not match Claude’s pentesting performance. Requires significant GPU resources (24GB+ VRAM for 70B models).

Model Selection Guide

By Use Case

Use Case	Recommended Model	Rationale
General pentesting	`claude-sonnet-4-5`	Best balance of quality and cost
Complex exploitation	`claude-opus-4`	Maximum reasoning for multi-step chains
Budget testing	`claude-sonnet-3-5`	Good quality at lower cost
Enterprise AWS	`anthropic.claude-sonnet-4-5` (Bedrock)	AWS compliance and security
Offline/air-gapped	`meta-llama/Llama-3.1-70B-Instruct` (vLLM)	No external API calls
Whitebox code analysis	`deepseek-ai/deepseek-coder-33b-instruct` (vLLM)	Code-specialized model

By Performance Requirements

Best Quality

claude-opus-4Maximum reasoning capability. Use for complex targets or when quality is critical.

Best Balance

claude-sonnet-4-5Excellent quality with reasonable cost and speed. Recommended default.

Best Cost

claude-sonnet-3-5Lower cost while maintaining good quality. Suitable for large-scale testing.

Model Capabilities

All recommended models support:

Tool Use: Execute pentest tools (curl, nmap, etc.)
Long Context: Handle large attack surface reports (100k+ tokens)
Structured Output: Generate JSON findings and reports
Multi-turn Reasoning: Adapt based on target responses

Pensar Apex automatically handles tool calling and structured output for all supported models.

Context Windows

Different models have different context limits:

Model	Context Window	Suitable For
Claude 4.5 Sonnet	200k tokens	Large applications, extensive attack surfaces
Claude 4 Opus	200k tokens	Complex multi-step exploitation
GPT-4	128k tokens	Medium-sized applications
Llama 3.1 70B	128k tokens	Standard pentesting scenarios

Larger context windows allow testing more endpoints in a single session without summarization.

Cost Considerations

Model costs vary significantly:

Anthropic Pricing (approximate)

Claude 4.5 Sonnet: ~ $3 per 1M input tokens, ~$ 15 per 1M output tokens
Claude 4 Opus: ~ $15 per 1M input tokens, ~$ 75 per 1M output tokens
Claude 3.5 Sonnet: ~ $3 per 1M input tokens, ~$ 15 per 1M output tokens

Typical Pentest Costs

Simple target (5-10 endpoints): $0.50 -$ 2.00
Medium target (20-50 endpoints): $2.00 -$ 10.00
Large target (100+ endpoints): $10.00 -$ 50.00

Actual costs depend on target complexity, number of endpoints, and exploitation depth.

Custom Models

You can use any model compatible with your provider:

OpenRouter Custom Models

Any model on openrouter.ai/models:

pensar pentest --target https://example.com --model mistralai/mixtral-8x22b

vLLM Custom Models

Any model supported by vLLM:

# Start vLLM with your model
vllm serve WizardLM/WizardCoder-Python-34B-V1.0 --port 8000

# Use in Pensar Apex
export LOCAL_MODEL_URL="http://localhost:8000/v1"
pensar pentest --target https://example.com --model WizardLM/WizardCoder-Python-34B-V1.0

Custom models may not perform well for pentesting. Test thoroughly before production use.

Model Configuration Storage

Your selected model is saved in ~/.pensar/config.json:

{
  "selectedModelId": "claude-sonnet-4-5",
  "localModelName": null,
  "localModelUrl": null
}

Command-line --model flag always overrides this setting.

Troubleshooting

Model not available

Ensure the model is supported by your provider:

# Check provider configuration
pensar doctor

Verify API key for the correct provider is set.

Poor pentesting results

Try upgrading to a more capable model:

Switch from GPT-4 to Claude: --model claude-sonnet-4-5
Switch from Sonnet to Opus: --model claude-opus-4

Claude models generally perform better for security testing.

Context length exceeded

Use a model with larger context window:

Claude models: 200k tokens
GPT-4: 128k tokens

Or reduce target scope to test fewer endpoints per run.

Local model too slow

For vLLM:

Use GPU acceleration
Enable quantization (8-bit, 4-bit)
Use smaller models (7B or 13B instead of 70B)

See vLLM Setup Guide for optimization tips.

Next Steps

AI Providers

Configure your AI provider API keys

Environment Variables

Complete configuration reference

vLLM Setup

Run models locally with vLLM

Run Pentest

Start testing with your selected model

Get Started

Core Concepts

Command Reference

Configuration

Guides

Security

​Default Model

​Selecting a Model

​Via Command Line

​Via TUI

​Programmatic API

​Recommended Models by Provider

​Claude 4.5 Sonnet (Default)

​Claude 4 Opus

​Claude 3.5 Sonnet

​GPT-4

​GPT-4 Turbo

​Claude on Bedrock

​Llama 3.1 70B

​Anthropic via OpenRouter

​Other Models

​Llama 3.1 70B Instruct

​DeepSeek Coder

​Model Selection Guide

​By Use Case

​By Performance Requirements

Best Quality

Best Balance

Best Cost

​Model Capabilities

​Context Windows

​Cost Considerations

​Anthropic Pricing (approximate)

​Typical Pentest Costs

​Custom Models

​OpenRouter Custom Models

​vLLM Custom Models

​Model Configuration Storage

​Troubleshooting

​Next Steps

AI Providers

Environment Variables

vLLM Setup

Run Pentest

Build docs developers (and LLMs) love

Default Model

Selecting a Model

Via Command Line

Via TUI

Programmatic API

Recommended Models by Provider

Claude 4.5 Sonnet (Default)

Claude 4 Opus

Claude 3.5 Sonnet

GPT-4

GPT-4 Turbo

Claude on Bedrock

Llama 3.1 70B

Anthropic via OpenRouter

Other Models

Llama 3.1 70B Instruct

DeepSeek Coder

Model Selection Guide

By Use Case

By Performance Requirements

Model Capabilities

Context Windows

Cost Considerations

Anthropic Pricing (approximate)

Typical Pentest Costs

Custom Models

OpenRouter Custom Models

vLLM Custom Models

Model Configuration Storage

Troubleshooting

Next Steps