Skip to main content

Overview

The Engineering Knowledge Graph (EKG) uses a graph data model to represent your engineering infrastructure as interconnected nodes and relationships. This model provides a natural way to understand complex dependencies, ownership, and impact analysis across your services, databases, teams, and deployments.

Graph Data Model

The knowledge graph consists of two fundamental building blocks:

Nodes

Represent entities in your infrastructure (services, databases, teams, etc.)

Edges

Represent relationships between nodes (calls, owns, uses, depends_on)

Node Structure

Nodes are defined using the Node class in connectors/base.py:13-18:
connectors/base.py
class Node(BaseModel):
    """Represents a node in the knowledge graph."""
    id: str
    type: str
    name: str
    properties: Dict[str, Any] = {}
  • id: Unique identifier in format type:name (e.g., service:payment-service)
  • type: Node classification (service, database, cache, team, deployment)
  • name: Human-readable name
  • properties: Flexible dictionary for additional metadata (team, port, image, etc.)

Edge Structure

Edges connect nodes and represent relationships defined in connectors/base.py:21-27:
connectors/base.py
class Edge(BaseModel):
    """Represents an edge in the knowledge graph."""
    id: str
    type: str
    source: str
    target: str
    properties: Dict[str, Any] = {}
  • id: Unique identifier in format edge:source-type-target
  • type: Relationship classification (calls, owns, uses, depends_on, exposes)
  • source: Source node ID
  • target: Target node ID
  • properties: Flexible dictionary for relationship metadata

Node Types

The knowledge graph supports several node types, each representing different infrastructure entities:
Represents microservices, APIs, or application components.Common Properties:
  • team: Owning team name
  • port: Exposed port number
  • image: Docker/container image
  • oncall: On-call contact
Example:
{
  "id": "service:payment-service",
  "type": "service",
  "name": "payment-service",
  "properties": {
    "team": "payments",
    "port": 8083,
    "image": "payment-service:latest"
  }
}

Edge Types

Relationships between nodes are classified by their semantic meaning:
Edge TypeDescriptionExample
CALLSService-to-service communicationservice:apiservice:payment-service
USESService using a database or cacheservice:apidatabase:users-db
DEPENDS_ONExplicit dependency declarationservice:frontendservice:api
OWNSTeam ownership of an assetteam:paymentsservice:payment-service
EXPOSESService exposing a deploymentservice:payment-servicedeployment:payment-service
Edge types are stored in uppercase in Neo4j (e.g., CALLS, USES) but can be specified in lowercase when querying.

Neo4j Storage Layer

The knowledge graph is persisted in Neo4j, a native graph database optimized for traversals and relationship queries.

GraphStorage Class

The GraphStorage class (graph/storage.py:14) provides the storage abstraction:
graph/storage.py
class GraphStorage:
    """Neo4j-based graph storage implementation."""
    
    def __init__(self, uri: str = None, user: str = None, password: str = None):
        """Initialize Neo4j connection."""
        self.uri = uri or os.getenv('NEO4J_URI', 'bolt://localhost:7687')
        self.user = user or os.getenv('NEO4J_USER', 'neo4j')
        self.password = password or os.getenv('NEO4J_PASSWORD', 'password')
        
        self.driver: Optional[Driver] = None
        self._connect()

Key Storage Operations

1

Adding Nodes

Nodes are added using MERGE to support upsert behavior:
graph/storage.py
def add_node(self, node: Node):
    """Add or update a node in the graph."""
    query = """
    MERGE (n {id: $id})
    SET n.type = $type,
        n.name = $name,
        n += $properties
    RETURN n
    """
    session.run(query, {
        'id': node.id,
        'type': node.type,
        'name': node.name,
        'properties': node.properties
    })
2

Adding Edges

Edges ensure both endpoint nodes exist before creating the relationship:
graph/storage.py
def add_edge(self, edge: Edge):
    """Add or update an edge in the graph."""
    # First ensure both nodes exist
    session.run("MERGE (n {id: $id})", {'id': edge.source})
    session.run("MERGE (n {id: $id})", {'id': edge.target})
    
    # Create relationship with dynamic type
    query = f"""
    MATCH (source {{id: $source}})
    MATCH (target {{id: $target}})
    MERGE (source)-[r:{edge.type.upper()}]->(target)
    SET r += $properties
    RETURN r
    """
3

Querying Nodes

Retrieve nodes by type and filters:
graph/storage.py
def get_nodes(self, node_type: str = None, filters: Dict[str, Any] = None):
    """Retrieve nodes by type and optional filters."""
    query = "MATCH (n)"
    params = {}
    
    conditions = []
    if node_type:
        conditions.append("n.type = $type")
        params['type'] = node_type
    
    if filters:
        for key, value in filters.items():
            param_name = f"filter_{key}"
            conditions.append(f"n.{key} = ${param_name}")
            params[param_name] = value
    
    if conditions:
        query += " WHERE " + " AND ".join(conditions)
    
    query += " RETURN n"

Cypher Query Execution

For advanced queries, you can execute custom Cypher directly (graph/storage.py:175):
graph/storage.py
def execute_cypher(self, query: str, parameters: Dict[str, Any] = None):
    """Execute a custom Cypher query."""
    with self.driver.session() as session:
        result = session.run(query, parameters or {})
        return [dict(record) for record in result]

Graph Initialization

The system initializes the graph in main.py:78-149 by:
1

Connect to Neo4j

Establish connection to Neo4j database using environment variables.
2

Load Connectors

Initialize connectors for Docker Compose, Teams, and Kubernetes configurations.
3

Parse Configuration Files

Each connector parses its respective configuration files to extract nodes and edges.
4

Populate Graph

Clear existing graph and load all parsed nodes and edges into Neo4j.
main.py
# Load Docker Compose
connector = DockerComposeConnector()
nodes, edges = connector.parse(str(docker_compose_file))
all_nodes.extend(nodes)
all_edges.extend(edges)

# Clear and populate graph
storage.clear_graph()
storage.add_nodes(all_nodes)
storage.add_edges(all_edges)

Why Graph Databases?

Natural Relationships

Graph databases natively model relationships without complex JOINs or denormalization.

Fast Traversals

Neo4j optimizes for graph traversals, making dependency analysis and pathfinding efficient.

Flexible Schema

Add new node types, edge types, and properties without schema migrations.

Query Language

Cypher provides an intuitive, SQL-like language designed specifically for graphs.

Next Steps

Connectors

Learn how connectors parse configuration files into graph data.

Query Engine

Explore powerful graph traversal and analysis capabilities.

Build docs developers (and LLMs) love