Knowledge Graph

Overview

PentAGI integrates Graphiti, a temporal knowledge graph system powered by Neo4j, to provide advanced semantic understanding and relationship tracking. While the vector store (pgvector) handles similarity-based memory retrieval, the knowledge graph captures structured relationships between entities, actions, and outcomes.

The knowledge graph is optional and disabled by default. Enable it for enhanced contextual learning and relationship discovery.

What is Graphiti?

Graphiti is a temporal knowledge graph system that:

Automatically extracts entities and relationships from agent interactions
Tracks temporal context showing how knowledge evolves over time
Stores semantic relationships between tools, targets, vulnerabilities, and techniques
Enables complex queries like “What tools were effective against similar targets?”
Learns patterns from successful penetration tests

PentAGI Fork

PentAGI uses a customized fork (vxcontrol/pentagi-graphiti) with:

Custom entity types specific to penetration testing:
- Targets (hosts, networks, applications)
- Tools (nmap, metasploit, sqlmap, etc.)
- Vulnerabilities (CVEs, misconfigurations)
- Techniques (MITRE ATT&CK mappings)
- Credentials (discovered during testing)
Custom edge types for security relationships:
- EXPLOITS (tool → vulnerability)
- DISCOVERED_ON (vulnerability → target)
- EXECUTED_AGAINST (tool → target)
- REQUIRES (technique → tool)
- MITIGATED_BY (vulnerability → defense)

Knowledge Graph vs Vector Store

Aspect	Vector Store (pgvector)	Knowledge Graph (Neo4j)
Search Type	Semantic similarity	Relationship traversal
Data Structure	Embeddings, unstructured text	Nodes and edges
Query Type	”Find similar memories"	"Find connected entities”
Strength	Fast approximate search	Complex relationship queries
Use Case	Contextual memory retrieval	Pattern discovery
Storage	PostgreSQL	Neo4j graph database

Both systems work together: the vector store provides quick context retrieval, while the knowledge graph uncovers deeper relationships and patterns.

Architecture Integration

Data Flow

1. Action Execution: When an agent executes a tool or generates a response, the content is captured. 2. Episode Creation: The action is sent to Graphiti as an “episode” containing:

Agent response text
Tool name and parameters
Execution results
Timestamp and context

3. Entity Extraction: Graphiti uses an LLM to extract:

Entity nodes (targets, tools, vulnerabilities)
Relationship edges (execution, discovery, exploitation)
Temporal context (when relationships formed)

4. Graph Storage: Extracted knowledge is stored in Neo4j with:

Node properties (name, type, summary, attributes)
Edge properties (fact, type, confidence, timestamps)
Group associations (flow ID for isolation)

5. Query Interface: Agents query the graph through specialized search functions.

Search Types

Graphiti provides multiple search strategies optimized for different scenarios:

Temporal Window Search

Find entities and relationships within a specific time range:

{
  "search_type": "temporal_window",
  "query": "What vulnerabilities were discovered on 192.168.1.10?",
  "time_start": "2024-03-15T10:00:00Z",
  "time_end": "2024-03-15T12:00:00Z",
  "max_results": 15
}

Use Cases:

Review discoveries during a specific testing phase
Analyze timeline of attack progression
Correlate activities across time windows

Entity Relationships Search

Explore connections from a center entity:

{
  "search_type": "entity_relationships",
  "query": "What tools successfully exploited this vulnerability?",
  "center_node_uuid": "uuid-of-vulnerability-node",
  "max_depth": 2,
  "node_labels": ["Tool", "Technique"],
  "edge_types": ["EXPLOITS", "REQUIRES"],
  "max_results": 20
}

Use Cases:

Map attack chains from initial access to privilege escalation
Find tools related to a specific vulnerability
Discover technique dependencies

Diverse Results Search

Get non-redundant, varied results to avoid tunnel vision:

{
  "search_type": "diverse_results",
  "query": "Alternative methods to gain remote code execution",
  "diversity_level": "high",  // low, medium, high
  "max_results": 10
}

Use Cases:

Explore alternative attack vectors
Avoid fixating on a single approach
Discover unconventional techniques

Episode Context Search

Search through historical agent responses and tool executions:

{
  "search_type": "episode_context",
  "query": "How did we bypass the WAF in past tests?",
  "max_results": 10
}

Use Cases:

Learn from past successful operations
Review agent reasoning for similar scenarios
Understand command sequences that worked

Successful Tools Search

Find tools and techniques that led to successful exploitation:

{
  "search_type": "successful_tools",
  "query": "What tools successfully exploited SQL injection?",
  "min_mentions": 2,  // Minimum successful uses
  "max_results": 15
}

Use Cases:

Prioritize tools with proven success rates
Identify reliable exploitation techniques
Learn from patterns of successful attacks

Recent Context Search

Retrieve recently discovered entities and relationships:

{
  "search_type": "recent_context",
  "query": "What did we recently learn about the target network?",
  "recency_window": "24h",  // 1h, 6h, 24h, 7d
  "max_results": 10
}

Use Cases:

Maintain awareness of recent discoveries
Avoid redundant reconnaissance
Build on latest findings

Entity by Label Search

Find entities of specific types matching criteria:

{
  "search_type": "entity_by_label",
  "query": "Windows servers in the DMZ",
  "node_labels": ["Target", "Host"],
  "edge_types": ["LOCATED_IN"],
  "max_results": 25
}

Use Cases:

Inventory targets by type
Find all instances of a vulnerability class
List available tools for a specific purpose

Entity Types

The PentAGI fork defines custom entity types for penetration testing:

Targets
Tools
Vulnerabilities
Techniques

Node Labels: Target, Host, Network, WebApp, ServiceProperties:

name: Identifier (IP, hostname, URL)
summary: Description of the target
attributes: OS, version, technology stack

Example:

{
  "labels": ["Target", "Host"],
  "name": "192.168.1.10",
  "summary": "Linux web server running Apache 2.4",
  "attributes": {
    "os": "Ubuntu 20.04",
    "open_ports": [22, 80, 443]
  }
}

Node Labels: Tool, Scanner, Exploit, FrameworkProperties:

name: Tool name (nmap, metasploit, sqlmap)
summary: Tool purpose and capabilities
attributes: Version, parameters used

Example:

{
  "labels": ["Tool", "Scanner"],
  "name": "nmap",
  "summary": "Network scanner for port discovery and service detection",
  "attributes": {
    "version": "7.94",
    "common_flags": "-sV -sC -p-"
  }
}

Node Labels: Vulnerability, CVE, MisconfigurationProperties:

name: CVE ID or vulnerability name
summary: Description and impact
attributes: CVSS score, affected versions

Example:

{
  "labels": ["Vulnerability", "CVE"],
  "name": "CVE-2023-12345",
  "summary": "Remote code execution in Apache Struts",
  "attributes": {
    "cvss": 9.8,
    "affected_versions": "2.3.x - 2.5.26"
  }
}

Node Labels: Technique, MITRE_ATT&CKProperties:

name: Technique name or MITRE ID
summary: Description of the technique
attributes: Tactic, platform

Example:

{
  "labels": ["Technique", "MITRE_ATT&CK"],
  "name": "T1059.001",
  "summary": "Command and Scripting Interpreter: PowerShell",
  "attributes": {
    "tactic": "Execution",
    "platform": "Windows"
  }
}

Edge Types

Relationships in the knowledge graph capture how entities interact:

Edge Type	Source → Target	Meaning
`EXPLOITS`	Tool → Vulnerability	Tool can exploit this vulnerability
`DISCOVERED_ON`	Vulnerability → Target	Vulnerability found on target
`EXECUTED_AGAINST`	Tool → Target	Tool was run against target
`REQUIRES`	Technique → Tool	Technique needs this tool
`MITIGATED_BY`	Vulnerability → Defense	Defense mechanism blocks exploit
`LEADS_TO`	Technique → Technique	Chaining attack techniques
`MENTIONS`	Episode → Entity	Agent discussion references entity

Each edge includes:

fact: Natural language description
created_at: When relationship was discovered
valid_at: When relationship was confirmed
expired_at: When relationship became invalid (optional)

Configuration

Enabling Graphiti

In your .env file:

# Graphiti knowledge graph settings
GRAPHITI_ENABLED=true
GRAPHITI_TIMEOUT=30
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_MODEL_NAME=gpt-5-mini  # LLM for entity extraction

# Neo4j settings (used by Graphiti)
NEO4J_USER=neo4j
NEO4J_DATABASE=neo4j
NEO4J_PASSWORD=your_secure_password
NEO4J_URI=bolt://neo4j:7687

# OpenAI key required for Graphiti entity extraction
OPEN_AI_KEY=your_openai_key

Starting Graphiti Stack

# Start PentAGI with knowledge graph
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml up -d

# Verify services
docker compose ps graphiti neo4j

# Check logs
docker compose logs -f graphiti

Accessing Neo4j Browser

Visit http://localhost:7474 and login with NEO4J_USER / NEO4J_PASSWORD to:

Visualize the knowledge graph
Run Cypher queries manually
Explore entity relationships
Debug graph structure

In production, restrict Neo4j browser access to trusted networks only.

Implementation Details

Episode Capture

From the source code (graphiti_search.go), episodes are captured with:

type graphiti.Observation struct {
    ID      string    // Observation ID for tracing
    TraceID string    // Trace ID for correlation
    Time    time.Time // Timestamp
}

Each episode includes:

Agent response content
Tool execution details
Group ID (flow isolation)
Observation metadata

Group Isolation

Graphiti isolates knowledge by flow:

groupID := fmt.Sprintf("flow-%d", flowID)

This ensures:

Different penetration tests don’t interfere
Each flow has its own knowledge context
Multi-tenant isolation (planned feature)

Search Implementation

From graphiti_search.go, the tool provides:

type GraphitiSearchTool struct {
    flowID         int64
    taskID         *int64
    subtaskID      *int64
    graphitiClient graphitiSearcher
}

func (t *GraphitiSearchTool) Handle(ctx context.Context, name string, args json.RawMessage) (string, error) {
    // Route to appropriate search method based on search_type
    switch searchArgs.SearchType {
    case "temporal_window":
        return t.handleTemporalWindowSearch(ctx, groupID, searchArgs, observationObject)
    case "entity_relationships":
        return t.handleEntityRelationshipsSearch(ctx, groupID, searchArgs, observationObject)
    // ... other search types
    }
}

Best Practices

Choose appropriate search types

Temporal window: When time context matters
Entity relationships: For attack chain analysis
Diverse results: When exploring alternatives
Episode context: To learn from past operations
Successful tools: For proven techniques
Recent context: To stay current
Entity by label: For inventory and classification

Optimize LLM for entity extraction

Use cost-effective models (gpt-4o-mini, claude-haiku)
Entity extraction doesn’t need reasoning capabilities
Fast models reduce graph update latency
Monitor token usage in Langfuse

Tune search parameters

Start with max_depth=2 for relationship searches
Use diversity_level="medium" as default
Set min_mentions=2 to filter noise in successful tools
Adjust recency_window based on test duration

Maintain graph quality

Review extracted entities periodically
Merge duplicate nodes manually if needed
Expire outdated relationships
Archive completed flows to separate graphs

Performance Considerations

Entity Extraction Latency: Each episode requires an LLM call to extract entities. Use fast models to minimize delay. Graph Query Performance: Neo4j is optimized for relationship traversal. Complex queries (depth > 3) may be slower. Storage Growth: Graph size grows with the number of entities and relationships. Monitor Neo4j storage usage. Concurrent Access: Multiple agents can query the graph simultaneously without conflicts.

Troubleshooting

Graphiti returns 'not enabled' message

Verify GRAPHITI_ENABLED=true in .env
Check Graphiti service is running: docker compose ps graphiti
Ensure Neo4j is accessible from Graphiti container
Review Graphiti logs for connection errors

Entity extraction fails

Confirm OPEN_AI_KEY is valid and has credits
Check GRAPHITI_MODEL_NAME is a supported model
Verify Graphiti can reach OpenAI API (check proxy settings)
Increase GRAPHITI_TIMEOUT if requests are slow

Search returns no results

Ensure episodes have been captured (check Neo4j browser)
Verify groupID matches the flow being queried
Relax search parameters (lower min_mentions, higher max_results)
Check if time windows exclude relevant data

Neo4j performance degrades

Create indexes on frequently queried properties
Limit result sizes with appropriate max_results
Archive old flows to reduce graph size
Consider increasing Neo4j memory allocation

Future Enhancements

Planned improvements to the knowledge graph:

Cross-flow learning: Query patterns across multiple penetration tests
Attack playbooks: Auto-generate sequences from successful attack chains
Confidence scoring: Track reliability of entity relationships
Community detection: Cluster related entities for higher-level insights
Graph visualization: Interactive UI for exploring relationships
Export formats: Generate reports from graph queries

Architecture - How knowledge graph integrates with system
Agent System - How agents use graph queries
Memory System - Complementary vector-based memory

Get Started

Core Concepts

Configuration

Deployment

Features

Development

Knowledge Graph

Overview

What is Graphiti?

PentAGI Fork

Knowledge Graph vs Vector Store

Architecture Integration

Data Flow

Search Types

Temporal Window Search

Entity Relationships Search

Diverse Results Search

Episode Context Search

Successful Tools Search

Recent Context Search

Entity by Label Search

Entity Types

Edge Types

Configuration

Enabling Graphiti

Starting Graphiti Stack

Accessing Neo4j Browser

Implementation Details

Episode Capture

Group Isolation

Search Implementation

Best Practices

Performance Considerations

Troubleshooting

Future Enhancements

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

​What is Graphiti?

​PentAGI Fork

​Knowledge Graph vs Vector Store

​Architecture Integration

​Data Flow

​Search Types

​Temporal Window Search

​Entity Relationships Search

​Diverse Results Search

​Episode Context Search

​Successful Tools Search

​Recent Context Search

​Entity by Label Search

​Entity Types

​Edge Types

​Configuration

​Enabling Graphiti

​Starting Graphiti Stack

​Accessing Neo4j Browser

​Implementation Details

​Episode Capture

​Group Isolation

​Search Implementation

​Best Practices

​Performance Considerations

​Troubleshooting

​Future Enhancements

​Related Concepts

Build docs developers (and LLMs) love

Overview

What is Graphiti?

PentAGI Fork

Knowledge Graph vs Vector Store

Architecture Integration

Data Flow

Search Types

Temporal Window Search

Entity Relationships Search

Diverse Results Search

Episode Context Search

Successful Tools Search

Recent Context Search

Entity by Label Search

Entity Types

Edge Types

Configuration

Enabling Graphiti

Starting Graphiti Stack

Accessing Neo4j Browser

Implementation Details

Episode Capture

Group Isolation

Search Implementation

Best Practices

Performance Considerations

Troubleshooting

Future Enhancements

Related Concepts