Overview
PentAGI integrates Graphiti, a temporal knowledge graph system powered by Neo4j, to provide advanced semantic understanding and relationship tracking. While the vector store (pgvector) handles similarity-based memory retrieval, the knowledge graph captures structured relationships between entities, actions, and outcomes.The knowledge graph is optional and disabled by default. Enable it for enhanced contextual learning and relationship discovery.
What is Graphiti?
Graphiti is a temporal knowledge graph system that:- Automatically extracts entities and relationships from agent interactions
- Tracks temporal context showing how knowledge evolves over time
- Stores semantic relationships between tools, targets, vulnerabilities, and techniques
- Enables complex queries like “What tools were effective against similar targets?”
- Learns patterns from successful penetration tests
PentAGI Fork
PentAGI uses a customized fork (vxcontrol/pentagi-graphiti) with:
-
Custom entity types specific to penetration testing:
- Targets (hosts, networks, applications)
- Tools (nmap, metasploit, sqlmap, etc.)
- Vulnerabilities (CVEs, misconfigurations)
- Techniques (MITRE ATT&CK mappings)
- Credentials (discovered during testing)
-
Custom edge types for security relationships:
EXPLOITS(tool → vulnerability)DISCOVERED_ON(vulnerability → target)EXECUTED_AGAINST(tool → target)REQUIRES(technique → tool)MITIGATED_BY(vulnerability → defense)
Knowledge Graph vs Vector Store
| Aspect | Vector Store (pgvector) | Knowledge Graph (Neo4j) |
|---|---|---|
| Search Type | Semantic similarity | Relationship traversal |
| Data Structure | Embeddings, unstructured text | Nodes and edges |
| Query Type | ”Find similar memories" | "Find connected entities” |
| Strength | Fast approximate search | Complex relationship queries |
| Use Case | Contextual memory retrieval | Pattern discovery |
| Storage | PostgreSQL | Neo4j graph database |
Both systems work together: the vector store provides quick context retrieval, while the knowledge graph uncovers deeper relationships and patterns.
Architecture Integration
Data Flow
1. Action Execution: When an agent executes a tool or generates a response, the content is captured. 2. Episode Creation: The action is sent to Graphiti as an “episode” containing:- Agent response text
- Tool name and parameters
- Execution results
- Timestamp and context
- Entity nodes (targets, tools, vulnerabilities)
- Relationship edges (execution, discovery, exploitation)
- Temporal context (when relationships formed)
- Node properties (name, type, summary, attributes)
- Edge properties (fact, type, confidence, timestamps)
- Group associations (flow ID for isolation)
Search Types
Graphiti provides multiple search strategies optimized for different scenarios:Temporal Window Search
Find entities and relationships within a specific time range:- Review discoveries during a specific testing phase
- Analyze timeline of attack progression
- Correlate activities across time windows
Entity Relationships Search
Explore connections from a center entity:- Map attack chains from initial access to privilege escalation
- Find tools related to a specific vulnerability
- Discover technique dependencies
Diverse Results Search
Get non-redundant, varied results to avoid tunnel vision:- Explore alternative attack vectors
- Avoid fixating on a single approach
- Discover unconventional techniques
Episode Context Search
Search through historical agent responses and tool executions:- Learn from past successful operations
- Review agent reasoning for similar scenarios
- Understand command sequences that worked
Successful Tools Search
Find tools and techniques that led to successful exploitation:- Prioritize tools with proven success rates
- Identify reliable exploitation techniques
- Learn from patterns of successful attacks
Recent Context Search
Retrieve recently discovered entities and relationships:- Maintain awareness of recent discoveries
- Avoid redundant reconnaissance
- Build on latest findings
Entity by Label Search
Find entities of specific types matching criteria:- Inventory targets by type
- Find all instances of a vulnerability class
- List available tools for a specific purpose
Entity Types
The PentAGI fork defines custom entity types for penetration testing:- Targets
- Tools
- Vulnerabilities
- Techniques
Node Labels:
Target, Host, Network, WebApp, ServiceProperties:name: Identifier (IP, hostname, URL)summary: Description of the targetattributes: OS, version, technology stack
Edge Types
Relationships in the knowledge graph capture how entities interact:| Edge Type | Source → Target | Meaning |
|---|---|---|
EXPLOITS | Tool → Vulnerability | Tool can exploit this vulnerability |
DISCOVERED_ON | Vulnerability → Target | Vulnerability found on target |
EXECUTED_AGAINST | Tool → Target | Tool was run against target |
REQUIRES | Technique → Tool | Technique needs this tool |
MITIGATED_BY | Vulnerability → Defense | Defense mechanism blocks exploit |
LEADS_TO | Technique → Technique | Chaining attack techniques |
MENTIONS | Episode → Entity | Agent discussion references entity |
fact: Natural language descriptioncreated_at: When relationship was discoveredvalid_at: When relationship was confirmedexpired_at: When relationship became invalid (optional)
Configuration
Enabling Graphiti
In your.env file:
Starting Graphiti Stack
Accessing Neo4j Browser
Visithttp://localhost:7474 and login with NEO4J_USER / NEO4J_PASSWORD to:
- Visualize the knowledge graph
- Run Cypher queries manually
- Explore entity relationships
- Debug graph structure
Implementation Details
Episode Capture
From the source code (graphiti_search.go), episodes are captured with:
- Agent response content
- Tool execution details
- Group ID (flow isolation)
- Observation metadata
Group Isolation
Graphiti isolates knowledge by flow:- Different penetration tests don’t interfere
- Each flow has its own knowledge context
- Multi-tenant isolation (planned feature)
Search Implementation
Fromgraphiti_search.go, the tool provides:
Best Practices
Choose appropriate search types
Choose appropriate search types
- Temporal window: When time context matters
- Entity relationships: For attack chain analysis
- Diverse results: When exploring alternatives
- Episode context: To learn from past operations
- Successful tools: For proven techniques
- Recent context: To stay current
- Entity by label: For inventory and classification
Optimize LLM for entity extraction
Optimize LLM for entity extraction
- Use cost-effective models (gpt-4o-mini, claude-haiku)
- Entity extraction doesn’t need reasoning capabilities
- Fast models reduce graph update latency
- Monitor token usage in Langfuse
Tune search parameters
Tune search parameters
- Start with
max_depth=2for relationship searches - Use
diversity_level="medium"as default - Set
min_mentions=2to filter noise in successful tools - Adjust
recency_windowbased on test duration
Maintain graph quality
Maintain graph quality
- Review extracted entities periodically
- Merge duplicate nodes manually if needed
- Expire outdated relationships
- Archive completed flows to separate graphs
Performance Considerations
Entity Extraction Latency: Each episode requires an LLM call to extract entities. Use fast models to minimize delay. Graph Query Performance: Neo4j is optimized for relationship traversal. Complex queries (depth > 3) may be slower. Storage Growth: Graph size grows with the number of entities and relationships. Monitor Neo4j storage usage. Concurrent Access: Multiple agents can query the graph simultaneously without conflicts.Troubleshooting
Graphiti returns 'not enabled' message
Graphiti returns 'not enabled' message
- Verify
GRAPHITI_ENABLED=truein.env - Check Graphiti service is running:
docker compose ps graphiti - Ensure Neo4j is accessible from Graphiti container
- Review Graphiti logs for connection errors
Entity extraction fails
Entity extraction fails
- Confirm
OPEN_AI_KEYis valid and has credits - Check
GRAPHITI_MODEL_NAMEis a supported model - Verify Graphiti can reach OpenAI API (check proxy settings)
- Increase
GRAPHITI_TIMEOUTif requests are slow
Search returns no results
Search returns no results
- Ensure episodes have been captured (check Neo4j browser)
- Verify
groupIDmatches the flow being queried - Relax search parameters (lower
min_mentions, highermax_results) - Check if time windows exclude relevant data
Neo4j performance degrades
Neo4j performance degrades
- Create indexes on frequently queried properties
- Limit result sizes with appropriate
max_results - Archive old flows to reduce graph size
- Consider increasing Neo4j memory allocation
Future Enhancements
Planned improvements to the knowledge graph:- Cross-flow learning: Query patterns across multiple penetration tests
- Attack playbooks: Auto-generate sequences from successful attack chains
- Confidence scoring: Track reliability of entity relationships
- Community detection: Cluster related entities for higher-level insights
- Graph visualization: Interactive UI for exploring relationships
- Export formats: Generate reports from graph queries
Related Concepts
- Architecture - How knowledge graph integrates with system
- Agent System - How agents use graph queries
- Memory System - Complementary vector-based memory