Skip to main content

Understanding Infrahub’s graph database foundation

Infrahub is built on a graph database foundation using Neo4j (or Memgraph as an alternative). This architectural choice enables Infrahub to model complex, interconnected infrastructure naturally, query relationships efficiently, and implement advanced features like branching and version control at the database level.

Why a graph database for infrastructure

Infrastructure is inherently interconnected. Devices connect to interfaces. Interfaces connect to networks. Networks span locations. Locations contain racks. Racks contain devices. These relationships form a complex web that’s difficult to model in traditional relational databases. Graph databases excel at representing and querying these interconnected relationships: Natural modeling: Infrastructure maps directly to graph concepts—devices are nodes, connections are edges. You model infrastructure as it actually exists rather than forcing it into rigid table structures. Efficient traversals: Finding all devices in a location, or all interfaces connected to a specific VLAN, becomes a simple graph traversal. Queries that require multiple joins in relational databases become single-hop or few-hop graph queries. Flexible schema: As your infrastructure evolves, the graph adapts. Adding new device types or relationship kinds doesn’t require complex migrations—you simply add nodes and edges with new properties. Relationship properties: Edges can carry data. An interface connection can store VLAN assignments, link speed, and status directly on the relationship. Complex queries: Graph query languages like Cypher enable sophisticated pattern matching. Find all redundant paths between two routers, identify single points of failure, or trace packet flows through the network.

Core concepts

Nodes, edges, and properties

Graph databases consist of three fundamental elements: Nodes represent entities in your infrastructure. Each node has:
  • Labels that categorize it (Device, Interface, IPAddress)
  • Properties that describe it (name, description, status)
  • A unique identifier
Infrahub creates nodes for:
  • Infrastructure objects (devices, interfaces, locations)
  • Schema definitions (node types, attributes, relationships)
  • System objects (branches, accounts, groups)
  • Attribute values (hostname values, IP address values)
Edges represent relationships between nodes. Each edge has:
  • A type that describes the relationship (HAS_INTERFACE, LOCATED_IN)
  • Properties that provide context (created_at, status)
  • Direction (from source node to target node)
Infrahub creates edges for:
  • Data relationships (device → interface, interface → IP address)
  • Schema relationships (node schema → attribute schema)
  • Temporal relationships (current value → previous value)
  • Branch relationships (branch → parent branch)
Properties are key-value pairs attached to nodes or edges:
Node: Device
Properties:
  uuid: "a1b2c3d4-..."
  kind: "InfraDevice"
  
Edge: HAS_INTERFACE
Properties:
  branch: "main"
  from: "2024-03-01T00:00:00Z"
  status: "active"

Neo4j and Memgraph

Infrahub supports two graph database engines: Neo4j is the default and most mature option. It provides:
  • Robust ACID transactions
  • Comprehensive Cypher language support
  • Mature clustering and replication
  • Extensive monitoring and tooling
  • Large community and ecosystem
Neo4j is recommended for production deployments where stability and ecosystem matter. Memgraph is an alternative that offers:
  • Higher write performance
  • In-memory operation for speed
  • OpenCypher compatibility
  • Smaller resource footprint
Memgraph works well for development environments and specific performance-critical use cases. Infrahub’s database abstraction layer supports both engines, allowing you to choose based on your requirements. The same Infrahub code works with either database.

The graph schema

Infrahub implements a specific graph schema—a pattern for how data is stored in the graph. Understanding this schema helps when debugging, optimizing queries, or extending Infrahub. Vertex types (node types in the graph):
  • Root: The single root node that anchors the database
  • Branch: Represents a branch (main, feature branches)
  • Node: Represents an infrastructure object instance
  • Relationship: Represents a relationship between objects
  • Attribute: Represents an attribute of an object
  • AttributeValue: Stores actual attribute values with timestamps
Edge types (relationship types in the graph):
  • HAS_ATTRIBUTE: Connects Node → Attribute
  • HAS_VALUE: Connects Attribute → AttributeValue
  • IS_RELATED: Connects Node → Relationship → Node
  • IS_PART_OF: Connects elements to branches
  • HAS_SOURCE/HAS_DESTINATION: Relationship endpoints
Temporal properties: Every edge has temporal properties:
from: "2024-03-01T00:00:00Z"  # When this edge became active
to: null                        # When this edge became inactive (null = still active)
branch: "main"                  # Which branch this edge exists in
status: "active"                # Current status
These properties enable Infrahub’s time travel and branching capabilities.

Architecture and implementation

How infrastructure objects map to the graph

When you create a device in Infrahub, the system creates a subgraph:
[Device Node]
  ├─[HAS_ATTRIBUTE]→[Attribute: hostname]
  │                   └─[HAS_VALUE]→[Value: "router-1"]
  ├─[HAS_ATTRIBUTE]→[Attribute: description]
  │                   └─[HAS_VALUE]→[Value: "Core router"]
  └─[IS_RELATED]→[Relationship: location]
                  └─[HAS_DESTINATION]→[Location Node]
This structure enables: Attribute-level versioning: Each attribute can have multiple values over time, connected in a chain:
[Attribute: hostname]
  └─[HAS_VALUE]→[Value: "router-1" (from: 2024-01-01)]
                 └─[PREVIOUS]→[Value: "old-router" (from: 2023-12-01, to: 2024-01-01)]
Branch isolation: Each edge has a branch property. Querying a branch only follows edges tagged with that branch or its ancestors:
# Main branch sees:
[Device]─[HAS_ATTRIBUTE (branch: main)]→[Attribute]

# Feature branch sees:
[Device]─[HAS_ATTRIBUTE (branch: feature)]→[Attribute]  # Branch-specific
[Device]─[HAS_ATTRIBUTE (branch: main)]→[Attribute]     # Inherited from main
Relationship traversal: Relationships are first-class entities in the graph, allowing rich metadata:
[Device A]─[IS_RELATED]→[Relationship: bgp_peer]─[HAS_DESTINATION]→[Device B]
                         └─[HAS_ATTRIBUTE]→[Attribute: as_number]
                                            └─[HAS_VALUE]→[Value: "65001"]

Query patterns

Infrahub uses Cypher queries to interact with Neo4j. The query patterns follow specific conventions: Pattern matching: Find nodes matching criteria:
MATCH (n:Node {kind: "InfraDevice"})
  -[:HAS_ATTRIBUTE]->(a:Attribute {name: "hostname"})
  -[:HAS_VALUE]->(v:AttributeValue)
WHERE v.value = "router-1"
RETURN n
Temporal filtering: Filter by time and branch:
MATCH (n:Node)-[r:HAS_ATTRIBUTE]->(a:Attribute)
WHERE r.branch IN ["main", "global"]
  AND r.from <= $timestamp
  AND (r.to IS NULL OR r.to > $timestamp)
  AND r.status = "active"
RETURN n, a
Relationship traversal: Follow relationships across the graph:
MATCH (device:Node {kind: "InfraDevice"})
  -[:IS_RELATED]->(rel:Relationship {name: "interfaces"})
  -[:HAS_DESTINATION]->(interface:Node {kind: "InfraInterface"})
RETURN device, interface
Infrahub encapsulates these patterns in the Query class pattern, providing a consistent interface for database operations.

Indexes and performance

Neo4j and Memgraph use indexes to optimize query performance. Infrahub creates indexes on: Node indexes:
  • UUID (unique identifier)
  • Kind (node type)
  • Combinations for common queries
Edge indexes:
  • Branch + from + to (temporal queries)
  • Status (active/inactive filtering)
  • Relationship types (traversal optimization)
Composite indexes:
  • (kind, uuid) for fast node lookups
  • (branch, from, to, status) for temporal + branch queries
Proper indexing is critical for performance. Without indexes, temporal queries would require full graph scans.

Transaction model

Infrahub uses ACID transactions to ensure data consistency: Atomicity: All changes in a transaction succeed or all fail. When creating a device with interfaces, either the complete object graph is created or nothing is. Consistency: Transactions enforce constraints. Unique attributes can’t have duplicates. Required relationships must exist. Isolation: Concurrent transactions don’t interfere. Multiple users can modify different objects simultaneously without seeing partial changes. Durability: Committed changes are permanent. Once a transaction commits, the changes survive system failures. The default branch uses stricter transaction isolation to prevent conflicts. Feature branches use more relaxed isolation for better performance.

Design decisions and trade-offs

Why Neo4j over PostgreSQL

Infrahub could have been built on a relational database like PostgreSQL. Why choose Neo4j? Relationship-first: Infrastructure is about relationships as much as entities. Graph databases treat relationships as first-class citizens. In PostgreSQL, relationships are implicit (foreign keys) and expensive (joins). In Neo4j, relationships are explicit edges that can be traversed efficiently. Flexible schema: Infrastructure schemas evolve. Adding a new device type in Neo4j is adding nodes with new labels. In PostgreSQL, it’s creating tables, indexes, foreign keys, and migration scripts. Query expressiveness: Cypher queries for path finding, pattern matching, and graph algorithms are more intuitive than equivalent SQL with recursive CTEs. Branching and versioning: Implementing Git-like branching in a relational database requires complex triggers, views, and application logic. Neo4j’s edge properties enable branching as a natural graph pattern. The trade-off: Neo4j has a smaller ecosystem than PostgreSQL. Fewer tools, less operational experience in the community, and a steeper learning curve.

Graph schema complexity

Infrahub’s graph schema (with Attribute nodes and AttributeValue nodes) adds complexity compared to storing attributes directly on object nodes. Why this design? Versioning: Storing attribute values in separate nodes enables clean versioning. Each value node has a timestamp. Previous values remain in the graph, creating a version chain. Branch isolation: Attribute values can differ between branches. Storing values separately allows branch-specific values while sharing the attribute definition. Metadata: Attributes have metadata (source, owner, protection status). Separate attribute nodes provide a place to store this metadata consistently. The cost: More nodes and edges mean more complex queries and higher storage overhead. Infrahub mitigates this through:
  • Efficient query patterns that fetch entire object graphs in single queries
  • Indexing strategies that optimize common access patterns
  • Caching layers that reduce database round-trips

Neo4j vs. Memgraph support

Supporting two database engines increases complexity. Why maintain both? Production robustness: Neo4j is mature and battle-tested for production workloads. It’s the safe choice for critical infrastructure. Development speed: Memgraph’s in-memory operation and faster writes benefit development workflows where write performance matters more than durability. Flexibility: Different users have different requirements. Some prioritize performance, others prioritize stability. Supporting both enables choice. The database abstraction layer isolates database-specific code. The core Infrahub logic works with either engine, minimizing maintenance overhead.

Implementation examples

Creating objects in the graph

When you create a device through the API, Infrahub generates Cypher queries to build the graph structure:
# Simplified example of device creation
async def create_device(db: InfrahubDatabase, hostname: str, location_id: str) -> str:
    # Create the device node
    device_query = """
    CREATE (n:Node:InfraDevice {
        uuid: $uuid,
        kind: "InfraDevice"
    })
    RETURN n.uuid as uuid
    """
    
    # Create attribute nodes and values
    attr_query = """
    MATCH (n:Node {uuid: $device_uuid})
    CREATE (a:Attribute {name: "hostname", uuid: $attr_uuid})
    CREATE (v:AttributeValue {value: $hostname, uuid: $value_uuid})
    CREATE (n)-[:HAS_ATTRIBUTE {branch: $branch, from: $timestamp}]->(a)
    CREATE (a)-[:HAS_VALUE {branch: $branch, from: $timestamp}]->(v)
    """
    
    # Create relationship to location
    rel_query = """
    MATCH (device:Node {uuid: $device_uuid})
    MATCH (location:Node {uuid: $location_uuid})
    CREATE (device)-[:IS_RELATED {branch: $branch, from: $timestamp}]->
           (r:Relationship {name: "location", uuid: $rel_uuid})->
           [:HAS_DESTINATION]->(location)
    """
The actual implementation is more complex, handling validation, defaults, indexes, and edge cases.

Querying across branches

Querying a feature branch requires following edges from both the feature branch and its parent:
// Get all devices visible in feature branch
MATCH (n:Node {kind: "InfraDevice"})
WHERE EXISTS {
    MATCH (n)-[r:HAS_ATTRIBUTE]->(a:Attribute)
    WHERE r.branch IN ["feature-network", "main", "global"]
      AND r.from <= $timestamp
      AND (r.to IS NULL OR r.to > $timestamp)
      AND r.status = "active"
}
RETURN n
The branch hierarchy determines which branches to include. The query pattern ensures only visible objects are returned.

Time travel queries

Accessing historical data requires filtering edges by timestamp:
// Get device hostname as it was on 2024-01-15
MATCH (n:Node {uuid: $device_uuid})
  -[r1:HAS_ATTRIBUTE]->(a:Attribute {name: "hostname"})
  -[r2:HAS_VALUE]->(v:AttributeValue)
WHERE r1.from <= $historical_timestamp
  AND (r1.to IS NULL OR r1.to > $historical_timestamp)
  AND r2.from <= $historical_timestamp
  AND (r2.to IS NULL OR r2.to > $historical_timestamp)
RETURN v.value
ORDER BY r2.from DESC
LIMIT 1
The query finds the most recent attribute value before or at the requested timestamp.

Build docs developers (and LLMs) love