Context Management

Overview

Effective context management is essential for long-running penetration testing sessions. PentAGI implements sophisticated memory systems and token optimization strategies to maintain conversation coherence while staying within LLM provider limits.

Memory Architecture

PentAGI uses a multi-layered memory system to balance context retention with performance:

Context Window Limits by Provider

Understand your provider’s token limits:

Provider	Model	Context Window	Recommended Usage
OpenAI	GPT-4 Turbo	128K tokens	Full penetration testing sessions
OpenAI	GPT-4o	128K tokens	Complex multi-step workflows
Anthropic	Claude 3.5 Sonnet	200K tokens	Extended research and analysis
Anthropic	Claude 4 Opus	200K tokens	Comprehensive security assessments
Google	Gemini 2.5 Pro	2M tokens	Large codebase analysis
AWS Bedrock	Claude via Bedrock	200K tokens	Enterprise security workflows
Ollama	Llama 3.1 70B	128K tokens	Local inference with custom context

Typical PentAGI agent workflows consume around 64K tokens, but the system uses 110K context size for safety margin and handling complex scenarios.

Token Optimization Strategies

1. Chain Summarization

Automatically condenses older conversation history while preserving critical context. See the Chain Summarization guide for detailed configuration.

2. Vector Store Retrieval

Stores and retrieves relevant information semantically:

# Embedding configuration
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_BATCH_SIZE=100
EMBEDDING_STRIP_NEW_LINES=true

3. Selective Context Loading

Loads only relevant memories based on current task:

Task-specific retrieval: Query vector store for similar past operations
Tool pattern matching: Load successful command sequences
Vulnerability knowledge: Retrieve known exploit techniques

Context Management Workflows

Initialization Phase

Execution Phase

During penetration testing:

Context Rotation: Older messages are summarized, recent ones preserved
Memory Storage: Important findings stored in vector database
Knowledge Retrieval: Similar past scenarios loaded on demand
State Tracking: Current task state maintained in working memory

Configuration Guidelines

For Short Sessions (< 30 minutes)

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=51200  # 50KB
SUMMARIZER_KEEP_QA_SECTIONS=1

For Extended Sessions (1-2 hours)

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=76800  # 75KB
SUMMARIZER_KEEP_QA_SECTIONS=3
SUMMARIZER_MAX_QA_BYTES=98304    # 96KB

For Complex Multi-Day Assessments

SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_LAST_SEC_BYTES=102400  # 100KB
SUMMARIZER_KEEP_QA_SECTIONS=5
SUMMARIZER_MAX_QA_BYTES=131072    # 128KB
SUMMARIZER_USE_QA=true

Monitoring Context Usage

Enable Langfuse Integration

Track token usage and conversation structure:

LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key

Monitor Token Metrics

Access Langfuse dashboard at http://localhost:4000 to view:

Token usage per agent
Summarization frequency
Context window utilization

Adjust Configuration

Based on metrics, tune summarizer settings to optimize performance.

Vector Store Management

Embedding Providers

PentAGI supports multiple embedding providers:

OpenAI: text-embedding-3-small, text-embedding-3-large
Ollama: Local embedding models (nomic-embed-text)
Mistral: Mistral AI embedding models
Jina: Jina AI embedding service
HuggingFace: Open source embedding models
GoogleAI: Google’s embedding models
VoyageAI: VoyageAI embedding service

Testing Embeddings

Use the etester utility to verify embedding configuration:

# Test embedding provider
docker exec -it pentagi /opt/pentagi/bin/etester test -verbose

# Show database statistics
docker exec -it pentagi /opt/pentagi/bin/etester info

# Search for documents
docker exec -it pentagi /opt/pentagi/bin/etester search -query "SQL injection" -limit 5

Important: Use the same embedding provider consistently. Changing providers requires re-indexing the entire knowledge base as vectors are incompatible across different models.

Best Practices

Preserve Tool Calls

Keep at least 3 recent sections to maintain tool call context for multi-step operations

Monitor Token Usage

Use Langfuse to track token consumption and identify optimization opportunities

Regular Cleanup

Periodically flush old embeddings that are no longer relevant to current assessments

Consistent Embeddings

Never change embedding providers mid-assessment as it breaks semantic search

Troubleshooting

Context Window Exceeded

If you encounter token limit errors:

Increase SUMMARIZER_LAST_SEC_BYTES to allow more aggressive summarization
Reduce SUMMARIZER_KEEP_QA_SECTIONS to compress more history
Switch to a provider with larger context windows (e.g., Gemini 2.5 Pro)

Poor Memory Recall

If the agent forgets important context:

Increase SUMMARIZER_KEEP_QA_SECTIONS to preserve more history
Verify vector store is functioning with etester
Check that relevant information is being stored in episodic memory

Slow Performance

If summarization is causing latency:

Reduce summarization frequency by increasing section size limits
Use a faster LLM model for summarization tasks
Disable QA summarization if not needed: SUMMARIZER_USE_QA=false

Chain Summarization

Deep dive into the summarization algorithm

Performance Tuning

Optimize resource usage and scaling

Custom Models

Create Ollama models with extended context

Setup Guides

Usage Guides

Advanced

Context Management

Overview

Memory Architecture

Context Window Limits by Provider

Token Optimization Strategies

1. Chain Summarization

2. Vector Store Retrieval

3. Selective Context Loading

Context Management Workflows

Initialization Phase

Execution Phase

Configuration Guidelines

For Short Sessions (< 30 minutes)

For Extended Sessions (1-2 hours)

For Complex Multi-Day Assessments

Monitoring Context Usage

Vector Store Management

Embedding Providers

Testing Embeddings

Best Practices

Preserve Tool Calls

Monitor Token Usage

Regular Cleanup

Consistent Embeddings

Troubleshooting

Chain Summarization

Performance Tuning

Custom Models

Build docs developers (and LLMs) love

Setup Guides

Usage Guides

Advanced

​Overview

​Memory Architecture

​Context Window Limits by Provider

​Token Optimization Strategies

​1. Chain Summarization

​2. Vector Store Retrieval

​3. Selective Context Loading

​Context Management Workflows

​Initialization Phase

​Execution Phase

​Configuration Guidelines

​For Short Sessions (< 30 minutes)

​For Extended Sessions (1-2 hours)

​For Complex Multi-Day Assessments

​Monitoring Context Usage

​Vector Store Management

​Embedding Providers

​Testing Embeddings

​Best Practices

Preserve Tool Calls

Monitor Token Usage

Regular Cleanup

Consistent Embeddings

​Troubleshooting

​Related Resources

Chain Summarization

Performance Tuning

Custom Models

Build docs developers (and LLMs) love

Overview

Memory Architecture

Context Window Limits by Provider

Token Optimization Strategies

1. Chain Summarization

2. Vector Store Retrieval

3. Selective Context Loading

Context Management Workflows

Initialization Phase

Execution Phase

Configuration Guidelines

For Short Sessions (< 30 minutes)

For Extended Sessions (1-2 hours)

For Complex Multi-Day Assessments

Monitoring Context Usage

Vector Store Management

Embedding Providers

Testing Embeddings

Best Practices

Troubleshooting

Related Resources