Understanding Infrahub’s graph database foundation
Infrahub is built on a graph database foundation using Neo4j (or Memgraph as an alternative). This architectural choice enables Infrahub to model complex, interconnected infrastructure naturally, query relationships efficiently, and implement advanced features like branching and version control at the database level.Why a graph database for infrastructure
Infrastructure is inherently interconnected. Devices connect to interfaces. Interfaces connect to networks. Networks span locations. Locations contain racks. Racks contain devices. These relationships form a complex web that’s difficult to model in traditional relational databases. Graph databases excel at representing and querying these interconnected relationships: Natural modeling: Infrastructure maps directly to graph concepts—devices are nodes, connections are edges. You model infrastructure as it actually exists rather than forcing it into rigid table structures. Efficient traversals: Finding all devices in a location, or all interfaces connected to a specific VLAN, becomes a simple graph traversal. Queries that require multiple joins in relational databases become single-hop or few-hop graph queries. Flexible schema: As your infrastructure evolves, the graph adapts. Adding new device types or relationship kinds doesn’t require complex migrations—you simply add nodes and edges with new properties. Relationship properties: Edges can carry data. An interface connection can store VLAN assignments, link speed, and status directly on the relationship. Complex queries: Graph query languages like Cypher enable sophisticated pattern matching. Find all redundant paths between two routers, identify single points of failure, or trace packet flows through the network.Core concepts
Nodes, edges, and properties
Graph databases consist of three fundamental elements: Nodes represent entities in your infrastructure. Each node has:- Labels that categorize it (Device, Interface, IPAddress)
- Properties that describe it (name, description, status)
- A unique identifier
- Infrastructure objects (devices, interfaces, locations)
- Schema definitions (node types, attributes, relationships)
- System objects (branches, accounts, groups)
- Attribute values (hostname values, IP address values)
- A type that describes the relationship (HAS_INTERFACE, LOCATED_IN)
- Properties that provide context (created_at, status)
- Direction (from source node to target node)
- Data relationships (device → interface, interface → IP address)
- Schema relationships (node schema → attribute schema)
- Temporal relationships (current value → previous value)
- Branch relationships (branch → parent branch)
Neo4j and Memgraph
Infrahub supports two graph database engines: Neo4j is the default and most mature option. It provides:- Robust ACID transactions
- Comprehensive Cypher language support
- Mature clustering and replication
- Extensive monitoring and tooling
- Large community and ecosystem
- Higher write performance
- In-memory operation for speed
- OpenCypher compatibility
- Smaller resource footprint
The graph schema
Infrahub implements a specific graph schema—a pattern for how data is stored in the graph. Understanding this schema helps when debugging, optimizing queries, or extending Infrahub. Vertex types (node types in the graph):- Root: The single root node that anchors the database
- Branch: Represents a branch (main, feature branches)
- Node: Represents an infrastructure object instance
- Relationship: Represents a relationship between objects
- Attribute: Represents an attribute of an object
- AttributeValue: Stores actual attribute values with timestamps
- HAS_ATTRIBUTE: Connects Node → Attribute
- HAS_VALUE: Connects Attribute → AttributeValue
- IS_RELATED: Connects Node → Relationship → Node
- IS_PART_OF: Connects elements to branches
- HAS_SOURCE/HAS_DESTINATION: Relationship endpoints
Architecture and implementation
How infrastructure objects map to the graph
When you create a device in Infrahub, the system creates a subgraph:Query patterns
Infrahub uses Cypher queries to interact with Neo4j. The query patterns follow specific conventions: Pattern matching: Find nodes matching criteria:Indexes and performance
Neo4j and Memgraph use indexes to optimize query performance. Infrahub creates indexes on: Node indexes:- UUID (unique identifier)
- Kind (node type)
- Combinations for common queries
- Branch + from + to (temporal queries)
- Status (active/inactive filtering)
- Relationship types (traversal optimization)
- (kind, uuid) for fast node lookups
- (branch, from, to, status) for temporal + branch queries
Transaction model
Infrahub uses ACID transactions to ensure data consistency: Atomicity: All changes in a transaction succeed or all fail. When creating a device with interfaces, either the complete object graph is created or nothing is. Consistency: Transactions enforce constraints. Unique attributes can’t have duplicates. Required relationships must exist. Isolation: Concurrent transactions don’t interfere. Multiple users can modify different objects simultaneously without seeing partial changes. Durability: Committed changes are permanent. Once a transaction commits, the changes survive system failures. The default branch uses stricter transaction isolation to prevent conflicts. Feature branches use more relaxed isolation for better performance.Design decisions and trade-offs
Why Neo4j over PostgreSQL
Infrahub could have been built on a relational database like PostgreSQL. Why choose Neo4j? Relationship-first: Infrastructure is about relationships as much as entities. Graph databases treat relationships as first-class citizens. In PostgreSQL, relationships are implicit (foreign keys) and expensive (joins). In Neo4j, relationships are explicit edges that can be traversed efficiently. Flexible schema: Infrastructure schemas evolve. Adding a new device type in Neo4j is adding nodes with new labels. In PostgreSQL, it’s creating tables, indexes, foreign keys, and migration scripts. Query expressiveness: Cypher queries for path finding, pattern matching, and graph algorithms are more intuitive than equivalent SQL with recursive CTEs. Branching and versioning: Implementing Git-like branching in a relational database requires complex triggers, views, and application logic. Neo4j’s edge properties enable branching as a natural graph pattern. The trade-off: Neo4j has a smaller ecosystem than PostgreSQL. Fewer tools, less operational experience in the community, and a steeper learning curve.Graph schema complexity
Infrahub’s graph schema (with Attribute nodes and AttributeValue nodes) adds complexity compared to storing attributes directly on object nodes. Why this design? Versioning: Storing attribute values in separate nodes enables clean versioning. Each value node has a timestamp. Previous values remain in the graph, creating a version chain. Branch isolation: Attribute values can differ between branches. Storing values separately allows branch-specific values while sharing the attribute definition. Metadata: Attributes have metadata (source, owner, protection status). Separate attribute nodes provide a place to store this metadata consistently. The cost: More nodes and edges mean more complex queries and higher storage overhead. Infrahub mitigates this through:- Efficient query patterns that fetch entire object graphs in single queries
- Indexing strategies that optimize common access patterns
- Caching layers that reduce database round-trips
Neo4j vs. Memgraph support
Supporting two database engines increases complexity. Why maintain both? Production robustness: Neo4j is mature and battle-tested for production workloads. It’s the safe choice for critical infrastructure. Development speed: Memgraph’s in-memory operation and faster writes benefit development workflows where write performance matters more than durability. Flexibility: Different users have different requirements. Some prioritize performance, others prioritize stability. Supporting both enables choice. The database abstraction layer isolates database-specific code. The core Infrahub logic works with either engine, minimizing maintenance overhead.Implementation examples
Creating objects in the graph
When you create a device through the API, Infrahub generates Cypher queries to build the graph structure:Querying across branches
Querying a feature branch requires following edges from both the feature branch and its parent:Time travel queries
Accessing historical data requires filtering edges by timestamp:Related topics
- Schema System - How schemas are stored in the graph
- Branching - How branches are implemented in the graph
- Version Control - How immutability is achieved in the graph
- Architecture - Overall system architecture