Memory System - PentAGI

Overview

PentAGI implements a sophisticated memory system that enables agents to learn from past experiences, maintain context across conversations, and retrieve relevant information for decision-making. The system combines vector-based semantic search with graph-based relationship tracking.

Memory Architecture

The memory system is organized into three distinct layers:

Long-term Memory

Vector Store: PostgreSQL with pgvector extension storing semantic embeddings for similarity-based retrieval. Knowledge Base: Structured domain expertise including vulnerability databases, tool capabilities, and security techniques. Tools Knowledge: Historical patterns of tool usage, success rates, and optimal parameter configurations.

Working Memory

Current Context: Active conversation state, recent messages, and immediate task information. Active Goals: Objectives being pursued in the current penetration testing session. System State: Available resources, loaded tools, and environmental constraints.

Episodic Memory

Past Actions: Complete history of commands executed, searches performed, and analyses conducted. Action Results: Outputs, success/failure status, and outcome details from past operations. Success Patterns: Learned strategies and techniques that have proven effective in similar scenarios.

Vector Storage Implementation

PentAGI uses PostgreSQL with the pgvector extension for efficient semantic search:

Storage Schema

CREATE TABLE IF NOT EXISTS langchain_pg_embedding (
    id UUID PRIMARY KEY,
    collection_id UUID NOT NULL,
    embedding vector(1536),  -- Dimension depends on embedding model
    document TEXT NOT NULL,
    cmetadata JSONB,
    custom_id TEXT
);

CREATE INDEX ON langchain_pg_embedding 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

Metadata Structure

Each memory entry includes rich metadata for filtering and context:

{
  "flow_id": "12345",
  "task_id": "67890",
  "subtask_id": "11111",
  "doc_type": "memory",
  "tool_name": "run_nmap",
  "tool_description": "Network scanning and service discovery",
  "agent_type": "executor",
  "created_at": "2024-03-15T10:30:00Z"
}

Memory Operations

Storing Memories

When agents perform actions, results are automatically stored as memory entries:

// Simplified from actual implementation
func StoreMemory(ctx context.Context, content string, metadata map[string]any) error {
    // Generate embedding vector
    embedding, err := embedder.EmbedQuery(ctx, content)
    if err != nil {
        return fmt.Errorf("failed to generate embedding: %w", err)
    }
    
    // Store in vector database
    doc := schema.Document{
        PageContent: content,
        Metadata:    metadata,
    }
    
    _, err = vectorStore.AddDocuments(ctx, []schema.Document{doc})
    return err
}

Retrieving Memories

Agents query memory using natural language questions that are converted to vectors:

// From memory.go implementation
func (m *memory) SearchMemory(ctx context.Context, query string, filters map[string]any) (string, error) {
    // Perform similarity search
    docs, err := m.store.SimilaritySearch(
        ctx,
        query,
        memoryVectorStoreResultLimit,  // Default: 3
        vectorstores.WithScoreThreshold(memoryVectorStoreThreshold),  // Default: 0.2
        vectorstores.WithFilters(filters),
    )
    if err != nil {
        return "", fmt.Errorf("failed to search for similar documents: %w", err)
    }
    
    // Format results for agent consumption
    return formatMemoryResults(docs), nil
}

Similarity Threshold

The system uses a similarity threshold of 0.2 (configurable) to filter relevant memories. Scores closer to 1.0 indicate higher similarity.

Result Limits

By default, the system returns the top 3 most similar memories to avoid overwhelming the agent with context.

Memory Search Patterns

Hierarchical Fallback

Memory searches follow a hierarchical pattern:

Specific Search: Query memories from current subtask
Task-Level Search: If no results, expand to current task
Flow-Level Search: If still no results, search entire flow
Global Search: Optionally search across all flows (planned)

// From memory.go implementation
if isSpecificFilters && len(docs) == 0 {
    // Fallback to global filters (flow-level only)
    docs, err = m.store.SimilaritySearch(
        ctx,
        query,
        memoryVectorStoreResultLimit,
        vectorstores.WithScoreThreshold(memoryVectorStoreThreshold),
        vectorstores.WithFilters(globalFilters),
    )
}

Agent-Specific Filtering

Agents can filter memories by their type to retrieve role-specific experiences:

// Researcher searching for past reconnaissance
{
  "flow_id": "12345",
  "doc_type": "memory",
  "agent_type": "researcher"
}

// Executor searching for past command results
{
  "flow_id": "12345",
  "doc_type": "memory",
  "agent_type": "executor",
  "tool_name": "run_nmap"
}

Embedding Models

PentAGI supports multiple embedding providers for generating vector representations:

Supported Providers

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-large  # 3072 dimensions
EMBEDDING_KEY=${OPEN_AI_KEY}

Embedding Configuration

# Common settings for all providers
EMBEDDING_STRIP_NEW_LINES=true  # Remove newlines before embedding
EMBEDDING_BATCH_SIZE=512        # Batch multiple texts for efficiency

Changing embedding models requires re-indexing all existing memories, as vectors from different models are not compatible.

Context Management

As conversations grow longer, PentAGI implements intelligent context management to stay within model token limits:

Chain Summarization

The system automatically summarizes older messages while preserving critical information:

Summarization Configuration

Global Settings
Assistant Settings

# Default values for global summarizer
SUMMARIZER_PRESERVE_LAST=true        # Keep recent messages intact
SUMMARIZER_USE_QA=true               # Use QA pair strategy
SUMMARIZER_SUM_MSG_HUMAN_IN_QA=false # Don't summarize human messages
SUMMARIZER_LAST_SEC_BYTES=51200      # 50KB for last section
SUMMARIZER_MAX_BP_BYTES=16384        # 16KB per body pair
SUMMARIZER_MAX_QA_SECTIONS=10        # Max QA sections
SUMMARIZER_MAX_QA_BYTES=65536        # 64KB for QA sections
SUMMARIZER_KEEP_QA_SECTIONS=1        # Recent QA sections to preserve

# Assistant-specific summarizer (more memory)
ASSISTANT_SUMMARIZER_PRESERVE_LAST=true
ASSISTANT_SUMMARIZER_LAST_SEC_BYTES=76800     # 75KB
ASSISTANT_SUMMARIZER_MAX_BP_BYTES=16384       # 16KB
ASSISTANT_SUMMARIZER_MAX_QA_SECTIONS=7        # More QA sections
ASSISTANT_SUMMARIZER_MAX_QA_BYTES=76800       # 75KB
ASSISTANT_SUMMARIZER_KEEP_QA_SECTIONS=3       # Keep more recent QA

What Gets Preserved

Always Kept:

System messages and initial prompts
Recent messages (configurable count)
Tool call structures and identifiers
Critical error messages

Summarized:

Older human questions
Previous assistant responses
Tool outputs (while preserving key findings)
Redundant information from early conversation

Never Summarized:

The last N QA pairs (configurable)
Messages in the last section (configurable size)
Tool call and response pairs (structure preserved)

Memory Types

Observation Memories

Raw factual information gathered during operations:

{
  "type": "observation",
  "content": "Target 192.168.1.10 has port 22 (SSH) open, running OpenSSH 7.4",
  "tool": "run_nmap",
  "timestamp": "2024-03-15T10:30:00Z"
}

Conclusion Memories

Higher-level insights derived from observations:

{
  "type": "conclusion",
  "content": "Target is vulnerable to CVE-2023-12345 based on SSH version banner",
  "reasoning": "OpenSSH 7.4 is affected by authentication bypass vulnerability",
  "timestamp": "2024-03-15T10:32:00Z"
}

Success Pattern Memories

Recorded techniques that achieved objectives:

{
  "type": "success_pattern",
  "content": "SQL injection in login form parameter 'username' with payload: admin' OR '1'='1",
  "target_type": "PHP web application",
  "tool_chain": ["run_nikto", "run_sqlmap"],
  "timestamp": "2024-03-15T11:00:00Z"
}

Integration with Knowledge Graph

While the vector store provides semantic search, the knowledge graph adds structured relationships. See Knowledge Graph for details on how these systems complement each other: Vector Store: “What memories are semantically similar to this query?” Knowledge Graph: “How are these entities related? What patterns connect them?”

Performance Considerations

Indexing Strategy

PentAGI uses IVFFlat indexing for approximate nearest neighbor search:

CREATE INDEX ON langchain_pg_embedding 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

Trade-offs:

Faster queries at the cost of slight accuracy loss
Lists parameter tuned for typical pentest memory sizes
Cosine similarity for normalized vectors

Query Optimization

Metadata Filtering: Apply filters before vector search to reduce candidates. Result Limiting: Default limit of 3 results balances context richness with token usage. Threshold Tuning: 0.2 similarity threshold filters out noise while retaining relevant memories.

Memory Cleanup

Long-running flows may accumulate large memory stores. Consider:

Periodic archiving of old memories
Consolidating similar memories through clustering
Removing low-value observations after success patterns are extracted

Best Practices

Choose appropriate embedding models

OpenAI text-embedding-3-large: Best accuracy, higher cost
OpenAI text-embedding-3-small: Balanced performance
Ollama nomic-embed-text: Free, local, good for development
Jina v2: Optimized for long documents

Structure memory content effectively

Include context in memory text (target, tool used, outcome)
Use descriptive metadata for filtering
Store atomic facts rather than large text blocks
Maintain consistent formatting for similar memory types

Tune search parameters

Lower threshold (0.1-0.2) for broader recall
Higher threshold (0.3-0.5) for precision
Increase result limit for research agents
Decrease result limit for executor agents

Monitor memory usage

Track vector store size growth
Monitor query performance over time
Review memory retrieval relevance
Adjust summarization settings based on context window usage

Architecture - System-wide memory integration
Agent System - How agents use memory for decision-making
Knowledge Graph - Complementary structured knowledge storage

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

​Memory Architecture

​Long-term Memory

​Working Memory

​Episodic Memory

​Vector Storage Implementation

​Storage Schema

​Metadata Structure

​Memory Operations

​Storing Memories

​Retrieving Memories

​Similarity Threshold

​Result Limits

​Memory Search Patterns

​Hierarchical Fallback

​Agent-Specific Filtering

​Embedding Models

​Supported Providers

​Embedding Configuration

​Context Management

​Chain Summarization

​Summarization Configuration

​What Gets Preserved

​Memory Types

​Observation Memories

​Conclusion Memories

​Success Pattern Memories

​Integration with Knowledge Graph

​Performance Considerations

​Indexing Strategy

​Query Optimization

​Memory Cleanup

​Best Practices

​Related Concepts

Build docs developers (and LLMs) love

Overview

Memory Architecture

Long-term Memory

Working Memory

Episodic Memory

Vector Storage Implementation

Storage Schema

Metadata Structure

Memory Operations

Storing Memories

Retrieving Memories

Similarity Threshold

Result Limits

Memory Search Patterns

Hierarchical Fallback

Agent-Specific Filtering

Embedding Models

Supported Providers

Embedding Configuration

Context Management

Chain Summarization

Summarization Configuration

What Gets Preserved

Memory Types

Observation Memories

Conclusion Memories

Success Pattern Memories

Integration with Knowledge Graph

Performance Considerations

Indexing Strategy

Query Optimization

Memory Cleanup

Best Practices

Related Concepts