Custom Ollama Models

Overview

PentAGI requires models with larger context windows than the default Ollama configurations. You can create custom models with increased context size through Modelfiles to handle complex penetration testing scenarios.

The num_ctx parameter can only be set during model creation via Modelfile. It cannot be changed after model creation or overridden at runtime.

Why Extended Context?

While typical agent workflows consume around 64K tokens, PentAGI uses 110K context size for:

Safety margin: Handle unexpected context growth during long sessions
Complex scenarios: Support multi-step penetration testing workflows
Tool call preservation: Maintain full tool execution history
Reasoning chains: Preserve extended thinking content from providers

Modelfile Basics

A Modelfile defines model configuration using a simple syntax:

FROM base-model-name
PARAMETER parameter_name value

Key Parameters

Parameter	Description	Recommended Value
`num_ctx`	Context window size	110000 (110K tokens)
`temperature`	Randomness in output	0.2-0.3 for pentesting
`top_p`	Nucleus sampling	0.7-0.8
`top_k`	Top-k sampling	20-40
`repeat_penalty`	Penalize repetition	1.1-1.2
`min_p`	Minimum probability	0.0

Example: Qwen3 32B with Extended Context

Qwen3 is a powerful model for security analysis and code generation.

Create the Modelfile

Create a file named Modelfile_qwen3_32b_fp16_tc:

FROM qwen3:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1

Build the Model

# First, pull the base model
ollama pull qwen3:32b-fp16

# Build custom model with extended context
ollama create qwen3:32b-fp16-tc -f Modelfile_qwen3_32b_fp16_tc

# Verify the model was created
ollama list | grep qwen3

Configure PentAGI

Add to your .env file:

OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=qwen3:32b-fp16-tc
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml

Example: QwQ 32B with Extended Context

QwQ is optimized for reasoning and complex problem-solving tasks.

Create the Modelfile

Create a file named Modelfile_qwq_32b_fp16_tc:

FROM qwq:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.2
PARAMETER top_p 0.7
PARAMETER min_p 0.0
PARAMETER top_k 40
PARAMETER repeat_penalty 1.2

Build the Model

# First, pull the base model
ollama pull qwq:32b-fp16

# Build custom model with extended context
ollama create qwq:32b-fp16-tc -f Modelfile_qwq_32b_fp16_tc

# Verify the model was created
ollama list | grep qwq

Hardware Requirements: The QwQ 32B FP16 model requires approximately 71.3 GB VRAM for inference. Ensure your system has sufficient GPU memory before attempting to use this model.

Configure PentAGI

OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=qwq:32b-fp16-tc
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

Example: Llama 3.1 8B with Extended Context

A more resource-friendly option for smaller systems.

Create the Modelfile

Create a file named Modelfile_llama31_8b_instruct_tc:

FROM llama3.1:8b-instruct-q8_0
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1

Build the Model

ollama pull llama3.1:8b-instruct-q8_0
ollama create llama3.1:8b-instruct-tc -f Modelfile_llama31_8b_instruct_tc

Configure PentAGI

OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-tc
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-llama318b.provider.yml

Provider Configuration Files

PentAGI includes pre-built provider configuration files for custom Ollama models:

/opt/pentagi/conf/ollama-llama318b.provider.yml
/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml
/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

These files map agent types to specific models with optimized settings.

Example Provider Configuration

simple:
  model: "qwen3:32b-fp16-tc"
  temperature: 0.2
  top_p: 0.3
  n: 1
  max_tokens: 4000

simple_json:
  model: "qwen3:32b-fp16-tc"
  temperature: 0.7
  top_p: 1.0
  n: 1
  max_tokens: 4000
  json: true

primary_agent:
  model: "qwen3:32b-fp16-tc"
  temperature: 0.3
  top_p: 0.95
  n: 1
  max_tokens: 12000

pentester:
  model: "qwen3:32b-fp16-tc"
  temperature: 0.3
  top_p: 0.8
  n: 1
  max_tokens: 8000

Testing Your Custom Model

Use the ctester utility to validate your custom model:

# Test with custom Ollama configuration
docker run --rm \
  -v $(pwd)/.env:/opt/pentagi/.env \
  vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml

# Test specific agents
docker exec -it pentagi /opt/pentagi/bin/ctester \
  -agents pentester,coder,primary_agent \
  -config /opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

Model Management

Auto-Pull Configuration

Configure automatic model downloads:

OLLAMA_SERVER_PULL_MODELS_ENABLED=true
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900  # 15 minutes
OLLAMA_SERVER_LOAD_MODELS_ENABLED=true

Performance Consideration: Model discovery adds 1-2s startup latency. Disable both flags and specify models in config file for fastest startup.

List Available Models

# List all Ollama models
ollama list

# Remove unused models
ollama rm model-name

# Show model information
ollama show model-name

Hardware Requirements

Model	Quantization	VRAM Required	Recommended GPU
Llama 3.1 8B	Q8_0	~9 GB	RTX 3090, RTX 4080
Llama 3.1 8B	FP16	~18 GB	RTX 3090, A5000
Qwen3 32B	Q4_0	~20 GB	RTX 4090, A5000
Qwen3 32B	FP16	~70 GB	A100 40GB (x2), H100
QwQ 32B	FP16	~71 GB	A100 40GB (x2), H100

Best Practices

Match Context to Provider

Set num_ctx to 110000 for consistency with PentAGI’s context management

Start Small

Begin with 8B models and scale up as needed for your hardware

Test Before Production

Use ctester to validate model performance before deployment

Monitor Resource Usage

Watch GPU memory and adjust batch sizes if needed

Troubleshooting

Model Creation Fails

Verify base model is pulled: ollama list
Check Modelfile syntax for typos
Ensure sufficient disk space for model storage

Out of Memory Errors

Use quantized models (Q4_0, Q8_0) instead of FP16
Reduce batch size in provider config
Close other GPU-intensive applications

Context Still Truncated

Verify model was created with num_ctx=110000
Check ollama show model-name for actual context size
Rebuild model if num_ctx is incorrect

Context Management

Optimize token usage and memory

Performance Tuning

Resource management and scaling

Chain Summarization

Advanced context compression

Setup Guides

Usage Guides

Advanced

Custom Ollama Models

Overview

Why Extended Context?

Modelfile Basics

Key Parameters

Example: Qwen3 32B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Example: QwQ 32B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Example: Llama 3.1 8B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Provider Configuration Files

Example Provider Configuration

Testing Your Custom Model

Model Management

Auto-Pull Configuration

List Available Models

Hardware Requirements

Best Practices

Match Context to Provider

Start Small

Test Before Production

Monitor Resource Usage

Troubleshooting

Context Management

Performance Tuning

Chain Summarization

Build docs developers (and LLMs) love

Setup Guides

Usage Guides

Advanced

​Overview

​Why Extended Context?

​Modelfile Basics

​Key Parameters

​Example: Qwen3 32B with Extended Context

​Create the Modelfile

​Build the Model

​Configure PentAGI

​Example: QwQ 32B with Extended Context

​Create the Modelfile

​Build the Model

​Configure PentAGI

​Example: Llama 3.1 8B with Extended Context

​Create the Modelfile

​Build the Model

​Configure PentAGI

​Provider Configuration Files

​Example Provider Configuration

​Testing Your Custom Model

​Model Management

​Auto-Pull Configuration

​List Available Models

​Hardware Requirements

​Best Practices

Match Context to Provider

Start Small

Test Before Production

Monitor Resource Usage

​Troubleshooting

​Related Resources

Context Management

Performance Tuning

Chain Summarization

Build docs developers (and LLMs) love

Overview

Why Extended Context?

Modelfile Basics

Key Parameters

Example: Qwen3 32B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Example: QwQ 32B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Example: Llama 3.1 8B with Extended Context

Create the Modelfile

Build the Model

Configure PentAGI

Provider Configuration Files

Example Provider Configuration

Testing Your Custom Model

Model Management

Auto-Pull Configuration

List Available Models

Hardware Requirements

Best Practices

Troubleshooting

Related Resources