Skip to main content
The Engineering Knowledge Graph uses connectors to parse configuration files from various sources and build the knowledge graph. This guide covers how to set up and manage data sources.

Overview

EKG supports three built-in connectors:
  • Docker Compose Connector: Parses docker-compose.yml files for service definitions
  • Teams Connector: Parses teams.yaml files for team ownership
  • Kubernetes Connector: Parses Kubernetes deployment YAML files
All connectors inherit from BaseConnector and return standardized Node and Edge objects.

Docker Compose Data Source

The Docker Compose connector extracts services, dependencies, and environment variables from docker-compose.yml files.

Setting Up Docker Compose Data

1

Create data directory

Ensure you have a data/ directory in your project root:
mkdir -p data
2

Add docker-compose.yml

Create or copy your docker-compose.yml file to data/docker-compose.yml:
cp your-docker-compose.yml data/docker-compose.yml
3

Reload data

The system automatically loads data on startup. To reload without restarting:
curl -X POST http://localhost:8000/api/reload

Docker Compose Format

The connector parses standard Docker Compose v3.8+ format:
version: '3.8'

services:
  api-gateway:
    build: ./services/api-gateway
    ports:
      - "8080:8080"
    environment:
      - AUTH_SERVICE_URL=http://auth-service:8081
      - ORDER_SERVICE_URL=http://order-service:8082
      - PAYMENT_SERVICE_URL=http://payment-service:8083
    depends_on:
      - auth-service
      - order-service
    labels:
      team: platform-team
      oncall: "@alice"

  auth-service:
    build: ./services/auth-service
    ports:
      - "8081:8081"
    environment:
      - DATABASE_URL=postgresql://postgres:secret@users-db:5432/users
      - REDIS_URL=redis://redis-main:6379
    depends_on:
      - users-db
      - redis-main
    labels:
      team: identity-team
      oncall: "@bob"

  users-db:
    image: postgres:15
    environment:
      - POSTGRES_DB=users
      - POSTGRES_PASSWORD=secret
    labels:
      team: identity-team
      type: database

  redis-main:
    image: redis:7-alpine
    labels:
      team: platform-team
      type: cache

Extracted Information

The Docker Compose connector extracts:
FieldDescriptionGraph Element
Service nameUnique identifierNode ID: service:api-gateway
depends_onExplicit dependenciesEdge: DEPENDS_ON
environment URLsService-to-service callsEdge: CALLS
environment DB URLsDatabase connectionsEdge: USES
labels.teamTeam ownershipProperty on node
labels.oncallOncall contactProperty on node
labels.typeNode type overrideNode type
portsExposed portsProperty on node
imageDocker imageProperty on node

Dependency Detection

The connector intelligently detects dependencies from environment variables:
def _extract_service_dependencies_from_env(self, env_vars: Dict[str, str]) -> List[str]:
    """Extract service dependencies from environment variables."""
    dependencies = []
    
    for key, value in env_vars.items():
        if key.endswith('_URL') or key.endswith('_SERVICE_URL'):
            # Extract service name from URL like http://payment-service:8083
            if '://' in value:
                url_part = value.split('://')[1]
                if ':' in url_part:
                    service_name = url_part.split(':')[0]
                    dependencies.append(service_name)
    
    return dependencies
Examples:
  • AUTH_SERVICE_URL=http://auth-service:8081 → Creates CALLS edge to auth-service
  • DATABASE_URL=postgresql://user@orders-db:5432/db → Creates USES edge to orders-db
  • REDIS_URL=redis://redis-main:6379 → Creates USES edge to redis-main

Teams Data Source

The Teams connector parses team ownership and contact information from YAML files.

Setting Up Teams Data

1

Create teams.yaml

Create a teams.yaml file in the data/ directory:
touch data/teams.yaml
2

Define teams

Add team definitions with ownership information:
teams:
  - name: platform-team
    lead: Alice Chen
    slack_channel: "#platform"
    pagerduty_schedule: "platform-oncall"
    owns:
      - api-gateway
      - notification-service
      - redis-main
  
  - name: identity-team
    lead: Bob Smith
    slack_channel: "#identity"
    pagerduty_schedule: "identity-oncall"
    owns:
      - auth-service
      - users-db
  
  - name: orders-team
    lead: David Lee
    slack_channel: "#orders"
    owns:
      - order-service
      - inventory-service
      - orders-db
      - inventory-db
  
  - name: payments-team
    lead: Frank Wilson
    slack_channel: "#payments"
    pagerduty_schedule: "payments-oncall"
    owns:
      - payment-service
      - payments-db
3

Reload data

Reload the configuration to apply changes:
curl -X POST http://localhost:8000/api/reload

Teams Format

The teams connector supports these fields:
FieldRequiredDescriptionExample
nameYesUnique team identifierplatform-team
leadNoTeam lead nameAlice Chen
slack_channelNoSlack channel for team#platform
pagerduty_scheduleNoPagerDuty schedule IDplatform-oncall
ownsYesList of owned services/databases[api-gateway, redis-main]

Ownership Relationships

The connector creates OWNS edges from teams to their assets:
def parse(self, file_path: str) -> tuple[List[Node], List[Edge]]:
    """Parse teams.yaml file."""
    teams = teams_data.get('teams', [])
    
    for team_config in teams:
        team_name = team_config.get('name')
        
        # Create team node
        team_node = self._create_team_node(team_config)
        nodes.append(team_node)
        
        # Create ownership edges
        owned_services = team_config.get('owns', [])
        for service_name in owned_services:
            # Infer service type from name
            service_type = self._infer_service_type(service_name)
            
            edge = self._create_edge(
                'owns',
                team_node.id,  # team:platform-team
                f"{service_type}:{service_name}"  # service:api-gateway
            )
            edges.append(edge)
The connector automatically infers node types from names. Services ending in -db or containing database are typed as database, names containing redis or cache are typed as cache, and everything else is typed as service.

Kubernetes Data Source

The Kubernetes connector parses Kubernetes deployment manifests for additional context.

Setting Up Kubernetes Data

1

Export Kubernetes configurations

Export your deployments to a YAML file:
kubectl get deployments -o yaml > data/k8s-deployments.yaml
2

Or create manually

Create a deployment manifest manually:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  labels:
    app: api-gateway
    team: platform-team
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: api-gateway
        image: api-gateway:v1.2.3
        ports:
        - containerPort: 8080
        env:
        - name: AUTH_SERVICE_URL
          value: "http://auth-service:8081"
        - name: ORDER_SERVICE_URL
          value: "http://order-service:8082"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth-service
  labels:
    app: auth-service
    team: identity-team
spec:
  replicas: 2
  selector:
    matchLabels:
      app: auth-service
  template:
    spec:
      containers:
      - name: auth-service
        image: auth-service:v2.1.0
        env:
        - name: DATABASE_URL
          value: "postgresql://user@users-db:5432/users"
3

Reload configuration

curl -X POST http://localhost:8000/api/reload

Kubernetes Format

The connector extracts information from standard Kubernetes manifests:
  • Deployment name: Used as service identifier
  • Labels: Team ownership and metadata
  • Container environment: Service dependencies
  • Image tags: Version information
  • Replicas: Scaling configuration

Data Loading Process

The system loads data on startup and when the reload API is called:
async def load_configuration_data():
    """Load and parse configuration files into the graph."""
    logger.info("Loading configuration data...")
    
    # Clear existing data
    storage.clear_graph()
    
    data_dir = Path("data")
    all_nodes = []
    all_edges = []
    
    # Load Docker Compose data
    docker_compose_file = data_dir / "docker-compose.yml"
    if docker_compose_file.exists():
        connector = DockerComposeConnector()
        nodes, edges = connector.parse(str(docker_compose_file))
        all_nodes.extend(nodes)
        all_edges.extend(edges)
        logger.info(f"Loaded {len(nodes)} nodes and {len(edges)} edges from Docker Compose")
    
    # Load Teams data
    teams_file = data_dir / "teams.yaml"
    if teams_file.exists():
        connector = TeamsConnector()
        nodes, edges = connector.parse(str(teams_file))
        all_nodes.extend(nodes)
        all_edges.extend(edges)
        logger.info(f"Loaded {len(nodes)} nodes and {len(edges)} edges from Teams")
    
    # Load Kubernetes data (optional)
    k8s_file = data_dir / "k8s-deployments.yaml"
    if k8s_file.exists():
        connector = KubernetesConnector()
        nodes, edges = connector.parse(str(k8s_file))
        all_nodes.extend(nodes)
        all_edges.extend(edges)
        logger.info(f"Loaded {len(nodes)} nodes and {len(edges)} edges from Kubernetes")
    
    # Store all data in graph
    storage.add_nodes(all_nodes)
    storage.add_edges(all_edges)
    
    logger.info(f"Total loaded: {len(all_nodes)} nodes and {len(all_edges)} edges")

Data Validation

Validate your configuration files before loading:
python scripts/validate_config.py
The validator checks:
  • YAML syntax is valid
  • Required fields are present
  • Service names are consistent across files
  • Team ownership is defined
  • No circular dependencies

Reloading Data

You can reload data without restarting the application:
curl -X POST http://localhost:8000/api/reload
Reloading data clears the entire graph and rebuilds it from scratch. This operation may take a few seconds for large datasets.

Best Practices

1

Keep data files in version control

Store your configuration files in Git to track changes:
git add data/docker-compose.yml data/teams.yaml
git commit -m "Update service dependencies"
2

Use consistent naming

Ensure service names match across all configuration files:
  • Docker Compose: api-gateway
  • Teams YAML: api-gateway (in owns list)
  • Kubernetes: api-gateway (in deployment name)
3

Document team ownership

Always specify team ownership using labels:
labels:
  team: platform-team
  oncall: "@alice"
4

Validate before deploying

Run validation before committing changes:
python scripts/validate_config.py && git commit

Troubleshooting

Data Not Appearing in Graph

If your data isn’t showing up:
  1. Check file paths: Files must be in the data/ directory
  2. Validate YAML syntax: yamllint data/*.yaml
  3. Check logs: docker-compose logs ekg-app | grep ERROR
  4. Reload data: curl -X POST http://localhost:8000/api/reload

Duplicate Nodes

If you see duplicate nodes with the same name:
  1. Ensure node IDs are consistent (format: type:name)
  2. Check for naming inconsistencies across files
  3. Use the same connector for the same data source

Missing Relationships

If edges aren’t created:
  1. Verify service names match exactly
  2. Check environment variable format (must contain URLs)
  3. Ensure target services exist as nodes
  4. Review connector logs for parsing errors

Next Steps

Build docs developers (and LLMs) love