Skip to main content

Overview

Effective context management is essential for long-running penetration testing sessions. PentAGI implements sophisticated memory systems and token optimization strategies to maintain conversation coherence while staying within LLM provider limits.

Memory Architecture

PentAGI uses a multi-layered memory system to balance context retention with performance:

Context Window Limits by Provider

Understand your provider’s token limits:
ProviderModelContext WindowRecommended Usage
OpenAIGPT-4 Turbo128K tokensFull penetration testing sessions
OpenAIGPT-4o128K tokensComplex multi-step workflows
AnthropicClaude 3.5 Sonnet200K tokensExtended research and analysis
AnthropicClaude 4 Opus200K tokensComprehensive security assessments
GoogleGemini 2.5 Pro2M tokensLarge codebase analysis
AWS BedrockClaude via Bedrock200K tokensEnterprise security workflows
OllamaLlama 3.1 70B128K tokensLocal inference with custom context
Typical PentAGI agent workflows consume around 64K tokens, but the system uses 110K context size for safety margin and handling complex scenarios.

Token Optimization Strategies

1. Chain Summarization

Automatically condenses older conversation history while preserving critical context. See the Chain Summarization guide for detailed configuration.

2. Vector Store Retrieval

Stores and retrieves relevant information semantically:
# Embedding configuration
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_BATCH_SIZE=100
EMBEDDING_STRIP_NEW_LINES=true

3. Selective Context Loading

Loads only relevant memories based on current task:
  • Task-specific retrieval: Query vector store for similar past operations
  • Tool pattern matching: Load successful command sequences
  • Vulnerability knowledge: Retrieve known exploit techniques

Context Management Workflows

Initialization Phase

Execution Phase

During penetration testing:
  1. Context Rotation: Older messages are summarized, recent ones preserved
  2. Memory Storage: Important findings stored in vector database
  3. Knowledge Retrieval: Similar past scenarios loaded on demand
  4. State Tracking: Current task state maintained in working memory

Configuration Guidelines

For Short Sessions (< 30 minutes)

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=51200  # 50KB
SUMMARIZER_KEEP_QA_SECTIONS=1

For Extended Sessions (1-2 hours)

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=76800  # 75KB
SUMMARIZER_KEEP_QA_SECTIONS=3
SUMMARIZER_MAX_QA_BYTES=98304    # 96KB

For Complex Multi-Day Assessments

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=102400  # 100KB
SUMMARIZER_KEEP_QA_SECTIONS=5
SUMMARIZER_MAX_QA_BYTES=131072    # 128KB
SUMMARIZER_USE_QA=true

Monitoring Context Usage

1

Enable Langfuse Integration

Track token usage and conversation structure:
LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
2

Monitor Token Metrics

Access Langfuse dashboard at http://localhost:4000 to view:
  • Token usage per agent
  • Summarization frequency
  • Context window utilization
3

Adjust Configuration

Based on metrics, tune summarizer settings to optimize performance.

Vector Store Management

Embedding Providers

PentAGI supports multiple embedding providers:
  • OpenAI: text-embedding-3-small, text-embedding-3-large
  • Ollama: Local embedding models (nomic-embed-text)
  • Mistral: Mistral AI embedding models
  • Jina: Jina AI embedding service
  • HuggingFace: Open source embedding models
  • GoogleAI: Google’s embedding models
  • VoyageAI: VoyageAI embedding service

Testing Embeddings

Use the etester utility to verify embedding configuration:
# Test embedding provider
docker exec -it pentagi /opt/pentagi/bin/etester test -verbose

# Show database statistics
docker exec -it pentagi /opt/pentagi/bin/etester info

# Search for documents
docker exec -it pentagi /opt/pentagi/bin/etester search -query "SQL injection" -limit 5
Important: Use the same embedding provider consistently. Changing providers requires re-indexing the entire knowledge base as vectors are incompatible across different models.

Best Practices

Preserve Tool Calls

Keep at least 3 recent sections to maintain tool call context for multi-step operations

Monitor Token Usage

Use Langfuse to track token consumption and identify optimization opportunities

Regular Cleanup

Periodically flush old embeddings that are no longer relevant to current assessments

Consistent Embeddings

Never change embedding providers mid-assessment as it breaks semantic search

Troubleshooting

If you encounter token limit errors:
  • Increase SUMMARIZER_LAST_SEC_BYTES to allow more aggressive summarization
  • Reduce SUMMARIZER_KEEP_QA_SECTIONS to compress more history
  • Switch to a provider with larger context windows (e.g., Gemini 2.5 Pro)
If the agent forgets important context:
  • Increase SUMMARIZER_KEEP_QA_SECTIONS to preserve more history
  • Verify vector store is functioning with etester
  • Check that relevant information is being stored in episodic memory
If summarization is causing latency:
  • Reduce summarization frequency by increasing section size limits
  • Use a faster LLM model for summarization tasks
  • Disable QA summarization if not needed: SUMMARIZER_USE_QA=false

Chain Summarization

Deep dive into the summarization algorithm

Performance Tuning

Optimize resource usage and scaling

Custom Models

Create Ollama models with extended context

Build docs developers (and LLMs) love