Overview
PentAGI requires models with larger context windows than the default Ollama configurations. You can create custom models with increased context size through Modelfiles to handle complex penetration testing scenarios.Why Extended Context?
While typical agent workflows consume around 64K tokens, PentAGI uses 110K context size for:- Safety margin: Handle unexpected context growth during long sessions
- Complex scenarios: Support multi-step penetration testing workflows
- Tool call preservation: Maintain full tool execution history
- Reasoning chains: Preserve extended thinking content from providers
Modelfile Basics
A Modelfile defines model configuration using a simple syntax:Key Parameters
| Parameter | Description | Recommended Value |
|---|---|---|
num_ctx | Context window size | 110000 (110K tokens) |
temperature | Randomness in output | 0.2-0.3 for pentesting |
top_p | Nucleus sampling | 0.7-0.8 |
top_k | Top-k sampling | 20-40 |
repeat_penalty | Penalize repetition | 1.1-1.2 |
min_p | Minimum probability | 0.0 |
Example: Qwen3 32B with Extended Context
Qwen3 is a powerful model for security analysis and code generation.Create the Modelfile
Create a file namedModelfile_qwen3_32b_fp16_tc:
Build the Model
Configure PentAGI
Add to your.env file:
Example: QwQ 32B with Extended Context
QwQ is optimized for reasoning and complex problem-solving tasks.Create the Modelfile
Create a file namedModelfile_qwq_32b_fp16_tc:
Build the Model
Configure PentAGI
Example: Llama 3.1 8B with Extended Context
A more resource-friendly option for smaller systems.Create the Modelfile
Create a file namedModelfile_llama31_8b_instruct_tc:
Build the Model
Configure PentAGI
Provider Configuration Files
PentAGI includes pre-built provider configuration files for custom Ollama models:/opt/pentagi/conf/ollama-llama318b.provider.yml/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml
Example Provider Configuration
Testing Your Custom Model
Use thectester utility to validate your custom model:
Model Management
Auto-Pull Configuration
Configure automatic model downloads:Performance Consideration: Model discovery adds 1-2s startup latency. Disable both flags and specify models in config file for fastest startup.
List Available Models
Hardware Requirements
| Model | Quantization | VRAM Required | Recommended GPU |
|---|---|---|---|
| Llama 3.1 8B | Q8_0 | ~9 GB | RTX 3090, RTX 4080 |
| Llama 3.1 8B | FP16 | ~18 GB | RTX 3090, A5000 |
| Qwen3 32B | Q4_0 | ~20 GB | RTX 4090, A5000 |
| Qwen3 32B | FP16 | ~70 GB | A100 40GB (x2), H100 |
| QwQ 32B | FP16 | ~71 GB | A100 40GB (x2), H100 |
Best Practices
Match Context to Provider
Set
num_ctx to 110000 for consistency with PentAGI’s context managementStart Small
Begin with 8B models and scale up as needed for your hardware
Test Before Production
Use ctester to validate model performance before deployment
Monitor Resource Usage
Watch GPU memory and adjust batch sizes if needed
Troubleshooting
Model Creation Fails
Model Creation Fails
- Verify base model is pulled:
ollama list - Check Modelfile syntax for typos
- Ensure sufficient disk space for model storage
Out of Memory Errors
Out of Memory Errors
- Use quantized models (Q4_0, Q8_0) instead of FP16
- Reduce batch size in provider config
- Close other GPU-intensive applications
Context Still Truncated
Context Still Truncated
- Verify model was created with
num_ctx=110000 - Check
ollama show model-namefor actual context size - Rebuild model if
num_ctxis incorrect
Related Resources
Context Management
Optimize token usage and memory
Performance Tuning
Resource management and scaling
Chain Summarization
Advanced context compression