Skip to main content
The Engineering Knowledge Graph is configured through environment variables and YAML data files. This guide covers all configuration options.

Environment Variables

EKG uses environment variables for database connections, API keys, and system settings. These are typically stored in a .env file.

Required Variables

These environment variables must be set for EKG to function:
GEMINI_API_KEY=your_gemini_api_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
VariableDescriptionExample
GEMINI_API_KEYGoogle Gemini API key for natural language processingAIzaSyD...
NEO4J_URINeo4j database connection URIbolt://localhost:7687
NEO4J_USERNeo4j usernameneo4j
NEO4J_PASSWORDNeo4j passwordpassword
Get your Gemini API key from Google AI Studio. The free tier provides 1,500 requests per day.

Loading Environment Variables

The application loads environment variables from the .env file using python-dotenv:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

def check_environment():
    """Check that required environment variables are set."""
    required_vars = ['GEMINI_API_KEY', 'NEO4J_URI', 'NEO4J_USER', 'NEO4J_PASSWORD']
    missing_vars = []
    
    for var in required_vars:
        if not os.getenv(var):
            missing_vars.append(var)
    
    if missing_vars:
        logger.error(f"Missing required environment variables: {', '.join(missing_vars)}")
        return False
    
    return True

Docker Environment Variables

When using Docker, environment variables are configured in docker-compose.yml:
services:
  ekg-app:
    environment:
      - NEO4J_URI=bolt://neo4j:7687
      - NEO4J_USER=neo4j
      - NEO4J_PASSWORD=password
      - GEMINI_API_KEY=${GEMINI_API_KEY}  # From host .env file
The GEMINI_API_KEY is read from your host machine’s .env file and passed to the container. Never commit your .env file to version control.

Neo4j Configuration

The Neo4j graph database stores all knowledge graph data. Configuration happens in both the connection string and Neo4j server settings.

Connection Configuration

The GraphStorage class reads Neo4j configuration from environment variables:
class GraphStorage:
    """Neo4j-based graph storage implementation."""
    
    def __init__(self, uri: str = None, user: str = None, password: str = None):
        """Initialize Neo4j connection."""
        self.uri = uri or os.getenv('NEO4J_URI', 'bolt://localhost:7687')
        self.user = user or os.getenv('NEO4J_USER', 'neo4j')
        self.password = password or os.getenv('NEO4J_PASSWORD', 'password')
        
        self.driver: Optional[Driver] = None
        self._connect()
    
    def _connect(self):
        """Establish connection to Neo4j."""
        try:
            self.driver = GraphDatabase.driver(
                self.uri, 
                auth=(self.user, self.password)
            )
            # Test connection
            with self.driver.session() as session:
                session.run("RETURN 1")
            logger.info(f"Connected to Neo4j at {self.uri}")
        except Exception as e:
            logger.error(f"Failed to connect to Neo4j: {e}")
            raise

Neo4j Server Configuration

For Docker deployments, Neo4j server settings are in docker-compose.yml:
services:
  neo4j:
    image: neo4j:5.15
    environment:
      - NEO4J_AUTH=neo4j/password      # Set initial credentials
      - NEO4J_PLUGINS=["apoc"]         # Enable APOC plugin
    ports:
      - "7474:7474"  # HTTP browser interface
      - "7687:7687"  # Bolt protocol
    volumes:
      - neo4j_data:/data               # Persist data
      - neo4j_logs:/logs               # Persist logs
    healthcheck:
      test: ["CMD", "cypher-shell", "-u", "neo4j", "-p", "password", "RETURN 1"]
      interval: 10s
      timeout: 5s
      retries: 5

Data Files Configuration

EKG reads infrastructure data from YAML files in the data/ directory. The system expects specific file formats for different data sources.

Required Data Files

The application validates that these files exist on startup:
def validate_data_files():
    """Check that required data files exist."""
    data_dir = Path("data")
    required_files = ["docker-compose.yml", "teams.yaml"]
    missing_files = []
    
    for file_name in required_files:
        file_path = data_dir / file_name
        if not file_path.exists():
            missing_files.append(str(file_path))
    
    if missing_files:
        logger.error(f"Missing required data files: {', '.join(missing_files)}")
        return False
    
    return True

Data Directory Structure

data/
├── docker-compose.yml      # Service definitions (required)
├── teams.yaml              # Team ownership (required)
└── k8s-deployments.yaml    # Kubernetes resources (optional)

Docker Compose Configuration

Services are defined in data/docker-compose.yml. The connector extracts services, dependencies, and environment variables:
services:
  api-gateway:
    build: ./services/api-gateway
    ports:
      - "8080:8080"
    environment:
      - AUTH_SERVICE_URL=http://auth-service:8081
      - ORDER_SERVICE_URL=http://order-service:8082
    depends_on:
      - auth-service
      - order-service
    labels:
      team: platform-team      # Team ownership
      oncall: "@alice"         # Oncall contact
  
  users-db:
    image: postgres:15
    environment:
      - POSTGRES_DB=users
    labels:
      team: identity-team
      type: database           # Node type override
The labels field is used to specify team ownership and node types. The connector uses these to create proper graph relationships.

Teams Configuration

Team ownership and contact information is defined in data/teams.yaml:
teams:
  - name: platform-team
    lead: Alice Chen
    slack_channel: "#platform"
    pagerduty_schedule: "platform-oncall"
    owns:
      - api-gateway
      - notification-service
      - redis-main
  
  - name: identity-team
    lead: Bob Smith
    slack_channel: "#identity"
    pagerduty_schedule: "identity-oncall"
    owns:
      - auth-service
      - users-db
  
  - name: orders-team
    lead: David Lee
    slack_channel: "#orders"
    owns:
      - order-service
      - inventory-service
      - orders-db
      - inventory-db

Kubernetes Configuration (Optional)

Kubernetes deployments can be added in data/k8s-deployments.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  labels:
    app: api-gateway
    team: platform-team
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: api-gateway
        image: api-gateway:latest
        ports:
        - containerPort: 8080
        env:
        - name: AUTH_SERVICE_URL
          value: "http://auth-service:8081"

Application Configuration

The FastAPI application is configured through code and environment variables.

Server Configuration

Uvicorn server settings are passed via command line:
python -m uvicorn chat.app:app --reload --port 8000

Logging Configuration

Logging is configured in the main entry point:
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
To change log level:
export LOG_LEVEL=DEBUG
python main.py

Query Engine Configuration

Query depth limits prevent infinite loops in graph traversals:
def downstream(self, node_id: str, max_depth: int = 10, edge_types: List[str] = None):
    """
    Get all transitive dependencies.
    
    Args:
        node_id: Starting node ID
        max_depth: Maximum traversal depth to prevent infinite loops (default: 10)
        edge_types: Optional list of edge types to follow
    """
    query = f"""
    MATCH path = (start {{id: $node_id}})-[r*1..{max_depth}]->(dependency)
    WITH dependency, min(length(path)) as distance
    RETURN dependency, distance
    ORDER BY distance, dependency.name
    """

Configuration Validation

EKG includes a configuration validator to check for common issues:
python scripts/validate_config.py
The validator checks:
  • Environment variables are set
  • Data files exist and are valid YAML
  • Neo4j connection is successful
  • Required fields are present in data files
  • Team ownership is properly defined

Best Practices

1

Use separate .env files for environments

Create different environment files for development, staging, and production:
.env.development
.env.staging
.env.production
Load the appropriate one:
ln -s .env.production .env
2

Never commit secrets

Add .env to .gitignore:
.env
.env.*
!.env.example
3

Use strong Neo4j passwords

Change the default Neo4j password in production:
NEO4J_PASSWORD=$(openssl rand -base64 32)
4

Validate configuration on startup

The application automatically validates configuration:
if not check_environment():
    sys.exit(1)

if not validate_data_files():
    sys.exit(1)

Next Steps

Build docs developers (and LLMs) love