The Engineering Knowledge Graph uses connectors to parse configuration files from various sources and build the knowledge graph. This guide covers how to set up and manage data sources.
Overview
EKG supports three built-in connectors:
Docker Compose Connector : Parses docker-compose.yml files for service definitions
Teams Connector : Parses teams.yaml files for team ownership
Kubernetes Connector : Parses Kubernetes deployment YAML files
All connectors inherit from BaseConnector and return standardized Node and Edge objects.
Docker Compose Data Source
The Docker Compose connector extracts services, dependencies, and environment variables from docker-compose.yml files.
Setting Up Docker Compose Data
Create data directory
Ensure you have a data/ directory in your project root:
Add docker-compose.yml
Create or copy your docker-compose.yml file to data/docker-compose.yml: cp your-docker-compose.yml data/docker-compose.yml
Reload data
The system automatically loads data on startup. To reload without restarting: curl -X POST http://localhost:8000/api/reload
The connector parses standard Docker Compose v3.8+ format:
version : '3.8'
services :
api-gateway :
build : ./services/api-gateway
ports :
- "8080:8080"
environment :
- AUTH_SERVICE_URL=http://auth-service:8081
- ORDER_SERVICE_URL=http://order-service:8082
- PAYMENT_SERVICE_URL=http://payment-service:8083
depends_on :
- auth-service
- order-service
labels :
team : platform-team
oncall : "@alice"
auth-service :
build : ./services/auth-service
ports :
- "8081:8081"
environment :
- DATABASE_URL=postgresql://postgres:secret@users-db:5432/users
- REDIS_URL=redis://redis-main:6379
depends_on :
- users-db
- redis-main
labels :
team : identity-team
oncall : "@bob"
users-db :
image : postgres:15
environment :
- POSTGRES_DB=users
- POSTGRES_PASSWORD=secret
labels :
team : identity-team
type : database
redis-main :
image : redis:7-alpine
labels :
team : platform-team
type : cache
The Docker Compose connector extracts:
Field Description Graph Element Service name Unique identifier Node ID: service:api-gateway depends_onExplicit dependencies Edge: DEPENDS_ON environment URLsService-to-service calls Edge: CALLS environment DB URLsDatabase connections Edge: USES labels.teamTeam ownership Property on node labels.oncallOncall contact Property on node labels.typeNode type override Node type portsExposed ports Property on node imageDocker image Property on node
Dependency Detection
The connector intelligently detects dependencies from environment variables:
connectors/docker_compose.py
def _extract_service_dependencies_from_env ( self , env_vars : Dict[ str , str ]) -> List[ str ]:
"""Extract service dependencies from environment variables."""
dependencies = []
for key, value in env_vars.items():
if key.endswith( '_URL' ) or key.endswith( '_SERVICE_URL' ):
# Extract service name from URL like http://payment-service:8083
if '://' in value:
url_part = value.split( '://' )[ 1 ]
if ':' in url_part:
service_name = url_part.split( ':' )[ 0 ]
dependencies.append(service_name)
return dependencies
Examples:
AUTH_SERVICE_URL=http://auth-service:8081 → Creates CALLS edge to auth-service
DATABASE_URL=postgresql://user@orders-db:5432/db → Creates USES edge to orders-db
REDIS_URL=redis://redis-main:6379 → Creates USES edge to redis-main
Teams Data Source
The Teams connector parses team ownership and contact information from YAML files.
Setting Up Teams Data
Create teams.yaml
Create a teams.yaml file in the data/ directory:
Define teams
Add team definitions with ownership information: teams :
- name : platform-team
lead : Alice Chen
slack_channel : "#platform"
pagerduty_schedule : "platform-oncall"
owns :
- api-gateway
- notification-service
- redis-main
- name : identity-team
lead : Bob Smith
slack_channel : "#identity"
pagerduty_schedule : "identity-oncall"
owns :
- auth-service
- users-db
- name : orders-team
lead : David Lee
slack_channel : "#orders"
owns :
- order-service
- inventory-service
- orders-db
- inventory-db
- name : payments-team
lead : Frank Wilson
slack_channel : "#payments"
pagerduty_schedule : "payments-oncall"
owns :
- payment-service
- payments-db
Reload data
Reload the configuration to apply changes: curl -X POST http://localhost:8000/api/reload
The teams connector supports these fields:
Field Required Description Example nameYes Unique team identifier platform-teamleadNo Team lead name Alice Chenslack_channelNo Slack channel for team #platformpagerduty_scheduleNo PagerDuty schedule ID platform-oncallownsYes List of owned services/databases [api-gateway, redis-main]
Ownership Relationships
The connector creates OWNS edges from teams to their assets:
def parse ( self , file_path : str ) -> tuple[List[Node], List[Edge]]:
"""Parse teams.yaml file."""
teams = teams_data.get( 'teams' , [])
for team_config in teams:
team_name = team_config.get( 'name' )
# Create team node
team_node = self ._create_team_node(team_config)
nodes.append(team_node)
# Create ownership edges
owned_services = team_config.get( 'owns' , [])
for service_name in owned_services:
# Infer service type from name
service_type = self ._infer_service_type(service_name)
edge = self ._create_edge(
'owns' ,
team_node.id, # team:platform-team
f " { service_type } : { service_name } " # service:api-gateway
)
edges.append(edge)
The connector automatically infers node types from names. Services ending in -db or containing database are typed as database, names containing redis or cache are typed as cache, and everything else is typed as service.
Kubernetes Data Source
The Kubernetes connector parses Kubernetes deployment manifests for additional context.
Setting Up Kubernetes Data
Export Kubernetes configurations
Export your deployments to a YAML file: kubectl get deployments -o yaml > data/k8s-deployments.yaml
Or create manually
Create a deployment manifest manually: data/k8s-deployments.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : api-gateway
labels :
app : api-gateway
team : platform-team
spec :
replicas : 3
selector :
matchLabels :
app : api-gateway
template :
metadata :
labels :
app : api-gateway
spec :
containers :
- name : api-gateway
image : api-gateway:v1.2.3
ports :
- containerPort : 8080
env :
- name : AUTH_SERVICE_URL
value : "http://auth-service:8081"
- name : ORDER_SERVICE_URL
value : "http://order-service:8082"
---
apiVersion : apps/v1
kind : Deployment
metadata :
name : auth-service
labels :
app : auth-service
team : identity-team
spec :
replicas : 2
selector :
matchLabels :
app : auth-service
template :
spec :
containers :
- name : auth-service
image : auth-service:v2.1.0
env :
- name : DATABASE_URL
value : "postgresql://user@users-db:5432/users"
Reload configuration
curl -X POST http://localhost:8000/api/reload
The connector extracts information from standard Kubernetes manifests:
Deployment name : Used as service identifier
Labels : Team ownership and metadata
Container environment : Service dependencies
Image tags : Version information
Replicas : Scaling configuration
Data Loading Process
The system loads data on startup and when the reload API is called:
async def load_configuration_data ():
"""Load and parse configuration files into the graph."""
logger.info( "Loading configuration data..." )
# Clear existing data
storage.clear_graph()
data_dir = Path( "data" )
all_nodes = []
all_edges = []
# Load Docker Compose data
docker_compose_file = data_dir / "docker-compose.yml"
if docker_compose_file.exists():
connector = DockerComposeConnector()
nodes, edges = connector.parse( str (docker_compose_file))
all_nodes.extend(nodes)
all_edges.extend(edges)
logger.info( f "Loaded { len (nodes) } nodes and { len (edges) } edges from Docker Compose" )
# Load Teams data
teams_file = data_dir / "teams.yaml"
if teams_file.exists():
connector = TeamsConnector()
nodes, edges = connector.parse( str (teams_file))
all_nodes.extend(nodes)
all_edges.extend(edges)
logger.info( f "Loaded { len (nodes) } nodes and { len (edges) } edges from Teams" )
# Load Kubernetes data (optional)
k8s_file = data_dir / "k8s-deployments.yaml"
if k8s_file.exists():
connector = KubernetesConnector()
nodes, edges = connector.parse( str (k8s_file))
all_nodes.extend(nodes)
all_edges.extend(edges)
logger.info( f "Loaded { len (nodes) } nodes and { len (edges) } edges from Kubernetes" )
# Store all data in graph
storage.add_nodes(all_nodes)
storage.add_edges(all_edges)
logger.info( f "Total loaded: { len (all_nodes) } nodes and { len (all_edges) } edges" )
Data Validation
Validate your configuration files before loading:
python scripts/validate_config.py
The validator checks:
YAML syntax is valid
Required fields are present
Service names are consistent across files
Team ownership is defined
No circular dependencies
Reloading Data
You can reload data without restarting the application:
curl -X POST http://localhost:8000/api/reload
Reloading data clears the entire graph and rebuilds it from scratch. This operation may take a few seconds for large datasets.
Best Practices
Keep data files in version control
Store your configuration files in Git to track changes: git add data/docker-compose.yml data/teams.yaml
git commit -m "Update service dependencies"
Use consistent naming
Ensure service names match across all configuration files:
Docker Compose: api-gateway
Teams YAML: api-gateway (in owns list)
Kubernetes: api-gateway (in deployment name)
Document team ownership
Always specify team ownership using labels: labels :
team : platform-team
oncall : "@alice"
Validate before deploying
Run validation before committing changes: python scripts/validate_config.py && git commit
Troubleshooting
Data Not Appearing in Graph
If your data isn’t showing up:
Check file paths: Files must be in the data/ directory
Validate YAML syntax: yamllint data/*.yaml
Check logs: docker-compose logs ekg-app | grep ERROR
Reload data: curl -X POST http://localhost:8000/api/reload
Duplicate Nodes
If you see duplicate nodes with the same name:
Ensure node IDs are consistent (format: type:name)
Check for naming inconsistencies across files
Use the same connector for the same data source
Missing Relationships
If edges aren’t created:
Verify service names match exactly
Check environment variable format (must contain URLs)
Ensure target services exist as nodes
Review connector logs for parsing errors
Next Steps