The Engineering Knowledge Graph is configured through environment variables and YAML data files. This guide covers all configuration options.
Environment Variables
EKG uses environment variables for database connections, API keys, and system settings. These are typically stored in a .env file.
Required Variables
These environment variables must be set for EKG to function:
GEMINI_API_KEY = your_gemini_api_key_here
NEO4J_URI = bolt://localhost:7687
NEO4J_USER = neo4j
NEO4J_PASSWORD = password
Variable Description Example GEMINI_API_KEYGoogle Gemini API key for natural language processing AIzaSyD...NEO4J_URINeo4j database connection URI bolt://localhost:7687NEO4J_USERNeo4j username neo4jNEO4J_PASSWORDNeo4j password password
Get your Gemini API key from Google AI Studio . The free tier provides 1,500 requests per day.
Loading Environment Variables
The application loads environment variables from the .env file using python-dotenv:
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
def check_environment ():
"""Check that required environment variables are set."""
required_vars = [ 'GEMINI_API_KEY' , 'NEO4J_URI' , 'NEO4J_USER' , 'NEO4J_PASSWORD' ]
missing_vars = []
for var in required_vars:
if not os.getenv(var):
missing_vars.append(var)
if missing_vars:
logger.error( f "Missing required environment variables: { ', ' .join(missing_vars) } " )
return False
return True
Docker Environment Variables
When using Docker, environment variables are configured in docker-compose.yml:
services :
ekg-app :
environment :
- NEO4J_URI=bolt://neo4j:7687
- NEO4J_USER=neo4j
- NEO4J_PASSWORD=password
- GEMINI_API_KEY=${GEMINI_API_KEY} # From host .env file
The GEMINI_API_KEY is read from your host machine’s .env file and passed to the container. Never commit your .env file to version control.
Neo4j Configuration
The Neo4j graph database stores all knowledge graph data. Configuration happens in both the connection string and Neo4j server settings.
Connection Configuration
The GraphStorage class reads Neo4j configuration from environment variables:
class GraphStorage :
"""Neo4j-based graph storage implementation."""
def __init__ ( self , uri : str = None , user : str = None , password : str = None ):
"""Initialize Neo4j connection."""
self .uri = uri or os.getenv( 'NEO4J_URI' , 'bolt://localhost:7687' )
self .user = user or os.getenv( 'NEO4J_USER' , 'neo4j' )
self .password = password or os.getenv( 'NEO4J_PASSWORD' , 'password' )
self .driver: Optional[Driver] = None
self ._connect()
def _connect ( self ):
"""Establish connection to Neo4j."""
try :
self .driver = GraphDatabase.driver(
self .uri,
auth = ( self .user, self .password)
)
# Test connection
with self .driver.session() as session:
session.run( "RETURN 1" )
logger.info( f "Connected to Neo4j at { self .uri } " )
except Exception as e:
logger.error( f "Failed to connect to Neo4j: { e } " )
raise
Neo4j Server Configuration
For Docker deployments, Neo4j server settings are in docker-compose.yml:
services :
neo4j :
image : neo4j:5.15
environment :
- NEO4J_AUTH=neo4j/password # Set initial credentials
- NEO4J_PLUGINS=["apoc"] # Enable APOC plugin
ports :
- "7474:7474" # HTTP browser interface
- "7687:7687" # Bolt protocol
volumes :
- neo4j_data:/data # Persist data
- neo4j_logs:/logs # Persist logs
healthcheck :
test : [ "CMD" , "cypher-shell" , "-u" , "neo4j" , "-p" , "password" , "RETURN 1" ]
interval : 10s
timeout : 5s
retries : 5
Data Files Configuration
EKG reads infrastructure data from YAML files in the data/ directory. The system expects specific file formats for different data sources.
Required Data Files
The application validates that these files exist on startup:
def validate_data_files ():
"""Check that required data files exist."""
data_dir = Path( "data" )
required_files = [ "docker-compose.yml" , "teams.yaml" ]
missing_files = []
for file_name in required_files:
file_path = data_dir / file_name
if not file_path.exists():
missing_files.append( str (file_path))
if missing_files:
logger.error( f "Missing required data files: { ', ' .join(missing_files) } " )
return False
return True
Data Directory Structure
data/
├── docker-compose.yml # Service definitions (required)
├── teams.yaml # Team ownership (required)
└── k8s-deployments.yaml # Kubernetes resources (optional)
Docker Compose Configuration
Services are defined in data/docker-compose.yml. The connector extracts services, dependencies, and environment variables:
services :
api-gateway :
build : ./services/api-gateway
ports :
- "8080:8080"
environment :
- AUTH_SERVICE_URL=http://auth-service:8081
- ORDER_SERVICE_URL=http://order-service:8082
depends_on :
- auth-service
- order-service
labels :
team : platform-team # Team ownership
oncall : "@alice" # Oncall contact
users-db :
image : postgres:15
environment :
- POSTGRES_DB=users
labels :
team : identity-team
type : database # Node type override
The labels field is used to specify team ownership and node types. The connector uses these to create proper graph relationships.
Teams Configuration
Team ownership and contact information is defined in data/teams.yaml:
teams :
- name : platform-team
lead : Alice Chen
slack_channel : "#platform"
pagerduty_schedule : "platform-oncall"
owns :
- api-gateway
- notification-service
- redis-main
- name : identity-team
lead : Bob Smith
slack_channel : "#identity"
pagerduty_schedule : "identity-oncall"
owns :
- auth-service
- users-db
- name : orders-team
lead : David Lee
slack_channel : "#orders"
owns :
- order-service
- inventory-service
- orders-db
- inventory-db
Kubernetes Configuration (Optional)
Kubernetes deployments can be added in data/k8s-deployments.yaml:
data/k8s-deployments.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : api-gateway
labels :
app : api-gateway
team : platform-team
spec :
replicas : 3
selector :
matchLabels :
app : api-gateway
template :
metadata :
labels :
app : api-gateway
spec :
containers :
- name : api-gateway
image : api-gateway:latest
ports :
- containerPort : 8080
env :
- name : AUTH_SERVICE_URL
value : "http://auth-service:8081"
Application Configuration
The FastAPI application is configured through code and environment variables.
Server Configuration
Uvicorn server settings are passed via command line:
python -m uvicorn chat.app:app --reload --port 8000
Logging Configuration
Logging is configured in the main entry point:
import logging
logging.basicConfig(
level = logging. INFO ,
format = ' %(asctime)s - %(name)s - %(levelname)s - %(message)s '
)
logger = logging.getLogger( __name__ )
To change log level:
export LOG_LEVEL = DEBUG
python main.py
Query Engine Configuration
Query depth limits prevent infinite loops in graph traversals:
def downstream ( self , node_id : str , max_depth : int = 10 , edge_types : List[ str ] = None ):
"""
Get all transitive dependencies.
Args:
node_id: Starting node ID
max_depth: Maximum traversal depth to prevent infinite loops (default: 10)
edge_types: Optional list of edge types to follow
"""
query = f """
MATCH path = (start {{ id: $node_id }} )-[r*1.. { max_depth } ]->(dependency)
WITH dependency, min(length(path)) as distance
RETURN dependency, distance
ORDER BY distance, dependency.name
"""
Configuration Validation
EKG includes a configuration validator to check for common issues:
python scripts/validate_config.py
The validator checks:
Environment variables are set
Data files exist and are valid YAML
Neo4j connection is successful
Required fields are present in data files
Team ownership is properly defined
Best Practices
Use separate .env files for environments
Create different environment files for development, staging, and production: .env.development
.env.staging
.env.production
Load the appropriate one: ln -s .env.production .env
Never commit secrets
Add .env to .gitignore: .env
.env.*
!.env.example
Use strong Neo4j passwords
Change the default Neo4j password in production: NEO4J_PASSWORD = $( openssl rand -base64 32 )
Validate configuration on startup
The application automatically validates configuration: if not check_environment():
sys.exit( 1 )
if not validate_data_files():
sys.exit( 1 )
Next Steps