Overview
Effective context management is essential for long-running penetration testing sessions. PentAGI implements sophisticated memory systems and token optimization strategies to maintain conversation coherence while staying within LLM provider limits.Memory Architecture
PentAGI uses a multi-layered memory system to balance context retention with performance:Context Window Limits by Provider
Understand your provider’s token limits:| Provider | Model | Context Window | Recommended Usage |
|---|---|---|---|
| OpenAI | GPT-4 Turbo | 128K tokens | Full penetration testing sessions |
| OpenAI | GPT-4o | 128K tokens | Complex multi-step workflows |
| Anthropic | Claude 3.5 Sonnet | 200K tokens | Extended research and analysis |
| Anthropic | Claude 4 Opus | 200K tokens | Comprehensive security assessments |
| Gemini 2.5 Pro | 2M tokens | Large codebase analysis | |
| AWS Bedrock | Claude via Bedrock | 200K tokens | Enterprise security workflows |
| Ollama | Llama 3.1 70B | 128K tokens | Local inference with custom context |
Typical PentAGI agent workflows consume around 64K tokens, but the system uses 110K context size for safety margin and handling complex scenarios.
Token Optimization Strategies
1. Chain Summarization
Automatically condenses older conversation history while preserving critical context. See the Chain Summarization guide for detailed configuration.2. Vector Store Retrieval
Stores and retrieves relevant information semantically:3. Selective Context Loading
Loads only relevant memories based on current task:- Task-specific retrieval: Query vector store for similar past operations
- Tool pattern matching: Load successful command sequences
- Vulnerability knowledge: Retrieve known exploit techniques
Context Management Workflows
Initialization Phase
Execution Phase
During penetration testing:- Context Rotation: Older messages are summarized, recent ones preserved
- Memory Storage: Important findings stored in vector database
- Knowledge Retrieval: Similar past scenarios loaded on demand
- State Tracking: Current task state maintained in working memory
Configuration Guidelines
For Short Sessions (< 30 minutes)
For Extended Sessions (1-2 hours)
For Complex Multi-Day Assessments
Monitoring Context Usage
Monitor Token Metrics
Access Langfuse dashboard at
http://localhost:4000 to view:- Token usage per agent
- Summarization frequency
- Context window utilization
Vector Store Management
Embedding Providers
PentAGI supports multiple embedding providers:- OpenAI: text-embedding-3-small, text-embedding-3-large
- Ollama: Local embedding models (nomic-embed-text)
- Mistral: Mistral AI embedding models
- Jina: Jina AI embedding service
- HuggingFace: Open source embedding models
- GoogleAI: Google’s embedding models
- VoyageAI: VoyageAI embedding service
Testing Embeddings
Use theetester utility to verify embedding configuration:
Best Practices
Preserve Tool Calls
Keep at least 3 recent sections to maintain tool call context for multi-step operations
Monitor Token Usage
Use Langfuse to track token consumption and identify optimization opportunities
Regular Cleanup
Periodically flush old embeddings that are no longer relevant to current assessments
Consistent Embeddings
Never change embedding providers mid-assessment as it breaks semantic search
Troubleshooting
Context Window Exceeded
Context Window Exceeded
If you encounter token limit errors:
- Increase
SUMMARIZER_LAST_SEC_BYTESto allow more aggressive summarization - Reduce
SUMMARIZER_KEEP_QA_SECTIONSto compress more history - Switch to a provider with larger context windows (e.g., Gemini 2.5 Pro)
Poor Memory Recall
Poor Memory Recall
If the agent forgets important context:
- Increase
SUMMARIZER_KEEP_QA_SECTIONSto preserve more history - Verify vector store is functioning with
etester - Check that relevant information is being stored in episodic memory
Slow Performance
Slow Performance
If summarization is causing latency:
- Reduce summarization frequency by increasing section size limits
- Use a faster LLM model for summarization tasks
- Disable QA summarization if not needed:
SUMMARIZER_USE_QA=false
Related Resources
Chain Summarization
Deep dive into the summarization algorithm
Performance Tuning
Optimize resource usage and scaling
Custom Models
Create Ollama models with extended context