Skip to main content

Overview

PentAGI implements a sophisticated memory system that enables agents to learn from past experiences, maintain context across conversations, and retrieve relevant information for decision-making. The system combines vector-based semantic search with graph-based relationship tracking.

Memory Architecture

The memory system is organized into three distinct layers:

Long-term Memory

Vector Store: PostgreSQL with pgvector extension storing semantic embeddings for similarity-based retrieval. Knowledge Base: Structured domain expertise including vulnerability databases, tool capabilities, and security techniques. Tools Knowledge: Historical patterns of tool usage, success rates, and optimal parameter configurations.

Working Memory

Current Context: Active conversation state, recent messages, and immediate task information. Active Goals: Objectives being pursued in the current penetration testing session. System State: Available resources, loaded tools, and environmental constraints.

Episodic Memory

Past Actions: Complete history of commands executed, searches performed, and analyses conducted. Action Results: Outputs, success/failure status, and outcome details from past operations. Success Patterns: Learned strategies and techniques that have proven effective in similar scenarios.

Vector Storage Implementation

PentAGI uses PostgreSQL with the pgvector extension for efficient semantic search:

Storage Schema

CREATE TABLE IF NOT EXISTS langchain_pg_embedding (
    id UUID PRIMARY KEY,
    collection_id UUID NOT NULL,
    embedding vector(1536),  -- Dimension depends on embedding model
    document TEXT NOT NULL,
    cmetadata JSONB,
    custom_id TEXT
);

CREATE INDEX ON langchain_pg_embedding 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

Metadata Structure

Each memory entry includes rich metadata for filtering and context:
{
  "flow_id": "12345",
  "task_id": "67890",
  "subtask_id": "11111",
  "doc_type": "memory",
  "tool_name": "run_nmap",
  "tool_description": "Network scanning and service discovery",
  "agent_type": "executor",
  "created_at": "2024-03-15T10:30:00Z"
}

Memory Operations

Storing Memories

When agents perform actions, results are automatically stored as memory entries:
// Simplified from actual implementation
func StoreMemory(ctx context.Context, content string, metadata map[string]any) error {
    // Generate embedding vector
    embedding, err := embedder.EmbedQuery(ctx, content)
    if err != nil {
        return fmt.Errorf("failed to generate embedding: %w", err)
    }
    
    // Store in vector database
    doc := schema.Document{
        PageContent: content,
        Metadata:    metadata,
    }
    
    _, err = vectorStore.AddDocuments(ctx, []schema.Document{doc})
    return err
}

Retrieving Memories

Agents query memory using natural language questions that are converted to vectors:
// From memory.go implementation
func (m *memory) SearchMemory(ctx context.Context, query string, filters map[string]any) (string, error) {
    // Perform similarity search
    docs, err := m.store.SimilaritySearch(
        ctx,
        query,
        memoryVectorStoreResultLimit,  // Default: 3
        vectorstores.WithScoreThreshold(memoryVectorStoreThreshold),  // Default: 0.2
        vectorstores.WithFilters(filters),
    )
    if err != nil {
        return "", fmt.Errorf("failed to search for similar documents: %w", err)
    }
    
    // Format results for agent consumption
    return formatMemoryResults(docs), nil
}

Similarity Threshold

The system uses a similarity threshold of 0.2 (configurable) to filter relevant memories. Scores closer to 1.0 indicate higher similarity.

Result Limits

By default, the system returns the top 3 most similar memories to avoid overwhelming the agent with context.

Memory Search Patterns

Hierarchical Fallback

Memory searches follow a hierarchical pattern:
  1. Specific Search: Query memories from current subtask
  2. Task-Level Search: If no results, expand to current task
  3. Flow-Level Search: If still no results, search entire flow
  4. Global Search: Optionally search across all flows (planned)
// From memory.go implementation
if isSpecificFilters && len(docs) == 0 {
    // Fallback to global filters (flow-level only)
    docs, err = m.store.SimilaritySearch(
        ctx,
        query,
        memoryVectorStoreResultLimit,
        vectorstores.WithScoreThreshold(memoryVectorStoreThreshold),
        vectorstores.WithFilters(globalFilters),
    )
}

Agent-Specific Filtering

Agents can filter memories by their type to retrieve role-specific experiences:
// Researcher searching for past reconnaissance
{
  "flow_id": "12345",
  "doc_type": "memory",
  "agent_type": "researcher"
}

// Executor searching for past command results
{
  "flow_id": "12345",
  "doc_type": "memory",
  "agent_type": "executor",
  "tool_name": "run_nmap"
}

Embedding Models

PentAGI supports multiple embedding providers for generating vector representations:

Supported Providers

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-large  # 3072 dimensions
EMBEDDING_KEY=${OPEN_AI_KEY}

Embedding Configuration

# Common settings for all providers
EMBEDDING_STRIP_NEW_LINES=true  # Remove newlines before embedding
EMBEDDING_BATCH_SIZE=512        # Batch multiple texts for efficiency
Changing embedding models requires re-indexing all existing memories, as vectors from different models are not compatible.

Context Management

As conversations grow longer, PentAGI implements intelligent context management to stay within model token limits:

Chain Summarization

The system automatically summarizes older messages while preserving critical information:

Summarization Configuration

# Default values for global summarizer
SUMMARIZER_PRESERVE_LAST=true        # Keep recent messages intact
SUMMARIZER_USE_QA=true               # Use QA pair strategy
SUMMARIZER_SUM_MSG_HUMAN_IN_QA=false # Don't summarize human messages
SUMMARIZER_LAST_SEC_BYTES=51200      # 50KB for last section
SUMMARIZER_MAX_BP_BYTES=16384        # 16KB per body pair
SUMMARIZER_MAX_QA_SECTIONS=10        # Max QA sections
SUMMARIZER_MAX_QA_BYTES=65536        # 64KB for QA sections
SUMMARIZER_KEEP_QA_SECTIONS=1        # Recent QA sections to preserve

What Gets Preserved

Always Kept:
  • System messages and initial prompts
  • Recent messages (configurable count)
  • Tool call structures and identifiers
  • Critical error messages
Summarized:
  • Older human questions
  • Previous assistant responses
  • Tool outputs (while preserving key findings)
  • Redundant information from early conversation
Never Summarized:
  • The last N QA pairs (configurable)
  • Messages in the last section (configurable size)
  • Tool call and response pairs (structure preserved)

Memory Types

Observation Memories

Raw factual information gathered during operations:
{
  "type": "observation",
  "content": "Target 192.168.1.10 has port 22 (SSH) open, running OpenSSH 7.4",
  "tool": "run_nmap",
  "timestamp": "2024-03-15T10:30:00Z"
}

Conclusion Memories

Higher-level insights derived from observations:
{
  "type": "conclusion",
  "content": "Target is vulnerable to CVE-2023-12345 based on SSH version banner",
  "reasoning": "OpenSSH 7.4 is affected by authentication bypass vulnerability",
  "timestamp": "2024-03-15T10:32:00Z"
}

Success Pattern Memories

Recorded techniques that achieved objectives:
{
  "type": "success_pattern",
  "content": "SQL injection in login form parameter 'username' with payload: admin' OR '1'='1",
  "target_type": "PHP web application",
  "tool_chain": ["run_nikto", "run_sqlmap"],
  "timestamp": "2024-03-15T11:00:00Z"
}

Integration with Knowledge Graph

While the vector store provides semantic search, the knowledge graph adds structured relationships. See Knowledge Graph for details on how these systems complement each other: Vector Store: “What memories are semantically similar to this query?” Knowledge Graph: “How are these entities related? What patterns connect them?”

Performance Considerations

Indexing Strategy

PentAGI uses IVFFlat indexing for approximate nearest neighbor search:
CREATE INDEX ON langchain_pg_embedding 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);
Trade-offs:
  • Faster queries at the cost of slight accuracy loss
  • Lists parameter tuned for typical pentest memory sizes
  • Cosine similarity for normalized vectors

Query Optimization

Metadata Filtering: Apply filters before vector search to reduce candidates. Result Limiting: Default limit of 3 results balances context richness with token usage. Threshold Tuning: 0.2 similarity threshold filters out noise while retaining relevant memories.

Memory Cleanup

Long-running flows may accumulate large memory stores. Consider:
  • Periodic archiving of old memories
  • Consolidating similar memories through clustering
  • Removing low-value observations after success patterns are extracted

Best Practices

  • OpenAI text-embedding-3-large: Best accuracy, higher cost
  • OpenAI text-embedding-3-small: Balanced performance
  • Ollama nomic-embed-text: Free, local, good for development
  • Jina v2: Optimized for long documents
  • Include context in memory text (target, tool used, outcome)
  • Use descriptive metadata for filtering
  • Store atomic facts rather than large text blocks
  • Maintain consistent formatting for similar memory types
  • Lower threshold (0.1-0.2) for broader recall
  • Higher threshold (0.3-0.5) for precision
  • Increase result limit for research agents
  • Decrease result limit for executor agents
  • Track vector store size growth
  • Monitor query performance over time
  • Review memory retrieval relevance
  • Adjust summarization settings based on context window usage

Build docs developers (and LLMs) love