Skip to main content
Neo4j is the graph database powering PentAGI’s Graphiti knowledge graph system. It provides high-performance storage and querying of complex relationships between entities, enabling semantic memory and contextual understanding.

Overview

Neo4j is a native graph database that stores and queries data as nodes and relationships. In PentAGI, it serves as:
  • Knowledge Storage: Persistent graph database for entities and relationships
  • Relationship Querying: Fast traversal of complex entity connections
  • Pattern Matching: Cypher query language for graph patterns
  • Temporal Tracking: Time-based relationship management
  • Visualization: Built-in browser for graph exploration

Architecture

Neo4j in the PentAGI stack:

Setup

1

Configure Neo4j Credentials

Set Neo4j authentication in your .env file:
.env
# Neo4j settings
NEO4J_USER=neo4j
NEO4J_PASSWORD=devpassword
NEO4J_DATABASE=neo4j
NEO4J_URI=bolt://neo4j:7687
Security: Change NEO4J_PASSWORD to a strong password. Default password neo4j is not allowed in Neo4j 4.0+.
2

Deploy Neo4j with Graphiti

Neo4j is included in the Graphiti stack:
curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-graphiti.yml
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml up -d
3

Verify Neo4j is Running

Check Neo4j service status:
# Check service health
docker compose ps neo4j

# Verify HTTP endpoint
curl http://localhost:7474

# Check Bolt connection
docker exec neo4j cypher-shell -u neo4j -p devpassword "RETURN 'Connected' as status;"
4

Access Neo4j Browser

Open the Neo4j Browser interface:
http://localhost:7474
Login with:
  • Username: neo4j
  • Password: devpassword (or your configured password)
  • Database: neo4j

Configuration

Docker Compose Settings

Neo4j service configuration:
docker-compose-graphiti.yml
neo4j:
  image: neo4j:5.26.2
  restart: unless-stopped
  container_name: neo4j
  hostname: neo4j
  ports:
    - "127.0.0.1:7474:7474"  # HTTP (Browser)
    - "127.0.0.1:7687:7687"  # Bolt (Protocol)
  volumes:
    - neo4j_data:/data         # Database storage
  environment:
    - NEO4J_AUTH=neo4j/devpassword
  shm_size: 4g                 # Shared memory for transactions
  healthcheck:
    test: ["CMD-SHELL", "wget -qO- http://localhost:7474 || exit 1"]
    interval: 1s
    timeout: 10s
    retries: 10

Environment Variables

Key Neo4j configuration options:
VariableDescriptionDefault
NEO4J_AUTHAuthentication (user/password)neo4j/devpassword
NEO4J_dbms_memory_heap_initial__sizeInitial heap size512m
NEO4J_dbms_memory_heap_max__sizeMaximum heap size1G
NEO4J_dbms_memory_pagecache_sizePage cache size512m
NEO4J_dbms_security_procedures_unrestrictedAllowed proceduresgds.*

Performance Tuning

For production deployments, increase memory limits:
docker-compose-graphiti.yml
neo4j:
  environment:
    - NEO4J_dbms_memory_heap_initial__size=2G
    - NEO4J_dbms_memory_heap_max__size=4G
    - NEO4J_dbms_memory_pagecache_size=2G
  shm_size: 8g

Cypher Query Language

Neo4j uses Cypher for querying graph data.

Basic Queries

Create a node:
CREATE (t:Target {name: "target.com", ip: "192.168.1.1"})
RETURN t
Create a relationship:
MATCH (t:Target {name: "target.com"})
CREATE (s:Service {name: "HTTP", port: 80})
CREATE (t)-[:HAS_SERVICE]->(s)
RETURN t, s
Find nodes:
MATCH (t:Target)
WHERE t.name CONTAINS "example"
RETURN t.name, t.ip

Pattern Matching

Find related entities:
// Find all services on a target
MATCH (t:Target)-[:HAS_SERVICE]->(s:Service)
WHERE t.name = "target.com"
RETURN s.name, s.port

// Find vulnerability chain
MATCH path = (tool:Tool)-[:DISCOVERS]->(vuln:Vulnerability)-[:AFFECTS]->(target:Target)
RETURN path

Aggregation

Count and aggregate:
// Count vulnerabilities by severity
MATCH (v:Vulnerability)
RETURN v.severity, count(*) as count
ORDER BY count DESC

// Most used tools
MATCH (a:Agent)-[:USED]->(t:Tool)
RETURN t.name, count(a) as usage_count
ORDER BY usage_count DESC
LIMIT 10

Graph Algorithms

Shortest path:
MATCH path = shortestPath(
  (start:Target {name: "entry.com"})-[*]-(end:Target {name: "internal.com"})
)
RETURN path

Usage

Neo4j Browser

The built-in browser provides:
  1. Query Editor: Write and execute Cypher queries
  2. Graph Visualization: Interactive node and relationship display
  3. Data Browser: Explore database schema and contents
  4. Query History: Review previous queries
  5. Favorites: Save frequently-used queries

Command Line Access

Use cypher-shell for CLI queries:
# Connect to Neo4j
docker exec -it neo4j cypher-shell -u neo4j -p devpassword

# Run a query
neo4j@neo4j> MATCH (n) RETURN count(n);

# Exit
neo4j@neo4j> :exit

Python Client

Query Neo4j from Python:
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "devpassword")
)

with driver.session(database="neo4j") as session:
    result = session.run("""
        MATCH (t:Target)-[:HAS_VULNERABILITY]->(v:Vulnerability)
        WHERE v.severity = 'HIGH'
        RETURN t.name, v.name, v.description
    """)
    
    for record in result:
        print(f"Target: {record['t.name']}")
        print(f"Vulnerability: {record['v.name']}")
        print(f"Description: {record['v.description']}")
        print()

driver.close()

Maintenance

Backup

Backup Neo4j data:
# Create backup
docker exec neo4j neo4j-admin database dump neo4j \
  --to-path=/var/lib/neo4j/data/backups

# Copy backup to host
docker cp neo4j:/var/lib/neo4j/data/backups/neo4j.dump ./

Restore

Restore from backup:
# Stop Neo4j
docker compose stop neo4j

# Copy backup to container
docker cp neo4j.dump neo4j:/var/lib/neo4j/data/backups/

# Restore database
docker exec neo4j neo4j-admin database load neo4j \
  --from-path=/var/lib/neo4j/data/backups

# Start Neo4j
docker compose start neo4j

Indexes

Create indexes for better performance:
// Index on Target name
CREATE INDEX target_name FOR (t:Target) ON (t.name)

// Index on Vulnerability type
CREATE INDEX vulnerability_type FOR (v:Vulnerability) ON (v.type)

// Composite index
CREATE INDEX target_composite FOR (t:Target) ON (t.name, t.ip)

// Full-text search index
CREATE FULLTEXT INDEX target_search FOR (t:Target) ON EACH [t.name, t.description]
View existing indexes:
SHOW INDEXES

Constraints

Ensure data integrity:
// Unique constraint
CREATE CONSTRAINT target_unique FOR (t:Target) REQUIRE t.name IS UNIQUE

// Existence constraint (Enterprise only)
CREATE CONSTRAINT vulnerability_name FOR (v:Vulnerability) REQUIRE v.name IS NOT NULL

Monitoring

Database Metrics

Query database statistics:
// Database size
CALL dbms.queryJmx('org.neo4j:instance=kernel#0,name=Store file sizes')
YIELD attributes
RETURN attributes.TotalStoreSize.value as total_size

// Node and relationship counts
MATCH (n)
RETURN labels(n) as label, count(*) as count
UNION
MATCH ()-[r]->()
RETURN type(r) as label, count(*) as count

// Transaction statistics
CALL dbms.listTransactions()
YIELD transactionId, currentQueryId, elapsedTime
RETURN *

Performance Profiling

Profile slow queries:
// Explain query plan
EXPLAIN
MATCH (t:Target)-[:HAS_SERVICE]->(s:Service)
WHERE t.name = "target.com"
RETURN s

// Profile query execution
PROFILE
MATCH (t:Target)-[:HAS_SERVICE]->(s:Service)
WHERE t.name = "target.com"
RETURN s

Logs

View Neo4j logs:
# Query log
docker exec neo4j cat /var/log/neo4j/query.log

# Debug log
docker exec neo4j cat /var/log/neo4j/debug.log

# Follow logs
docker compose logs -f neo4j

Troubleshooting

Connection Issues

Verify Neo4j is accessible:
# Check if Neo4j is listening
docker exec neo4j netstat -tlnp | grep 7687

# Test Bolt connection
telnet localhost 7687

# Test from Graphiti container
docker exec graphiti nc -zv neo4j 7687

Authentication Errors

Reset password:
# Stop Neo4j
docker compose stop neo4j

# Disable authentication temporarily
docker compose run --rm neo4j neo4j-admin set-initial-password newpassword

# Update .env file
NEO4J_PASSWORD=newpassword

# Restart
docker compose up -d neo4j

Performance Issues

Diagnose slow queries:
  1. Enable query logging:
    NEO4J_dbms_logs_query_enabled=true
    NEO4J_dbms_logs_query_threshold=100ms
    
  2. Analyze query plans with PROFILE
  3. Add missing indexes
  4. Increase memory allocation

Data Corruption

Recover from corruption:
# Check database consistency
docker exec neo4j neo4j-admin check-consistency neo4j

# Repair if needed (CAUTION: may lose data)
docker exec neo4j neo4j-admin database repair neo4j

Best Practices

Schema Design

  • Use meaningful node labels and relationship types
  • Normalize properties across similar nodes
  • Avoid deeply nested queries (> 5 levels)
  • Use indexes on frequently queried properties
  • Model relationships as first-class entities

Query Optimization

  • Always use indexes for lookups
  • Limit result sets with LIMIT
  • Use WITH to pipeline queries
  • Avoid Cartesian products
  • Profile queries before production

Security

  • Change default password immediately
  • Use strong passwords (16+ characters)
  • Restrict network access to trusted IPs
  • Enable TLS/SSL in production
  • Regularly update Neo4j version

Data Management

  • Regular backups (daily minimum)
  • Monitor disk usage
  • Archive old data periodically
  • Clean up unused nodes and relationships
  • Document schema and queries

Build docs developers (and LLMs) love