Skip to main content

Overview

Aurora uses PostgreSQL as its primary database with Row-Level Security (RLS) policies for multi-tenant data isolation. The schema is organized into several domains:
  • Authentication & Users: User accounts and preferences
  • Chat & Sessions: Chat history and session management
  • Incidents & RCA: Incident tracking and root cause analysis
  • Cloud Resources: Kubernetes, cloud provider data
  • Monitoring: Alerts from Grafana, Datadog, PagerDuty, etc.
  • Integrations: GitHub, Slack, Jenkins, etc.

Connection Details

Environment Variables:
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_DB=aurora_db
POSTGRES_USER=postgres
POSTGRES_PASSWORD=yourpassword
Connection String:
postgresql://postgres:password@postgres:5432/aurora_db

Core Tables

users

User authentication and profile information.
CREATE TABLE users (
    id VARCHAR(255) PRIMARY KEY DEFAULT gen_random_uuid()::TEXT,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    name VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);
Security:
  • Passwords hashed with bcrypt
  • Email uniqueness enforced
  • No RLS (public table for authentication)

user_tokens

Cloud provider OAuth tokens and credentials.
CREATE TABLE user_tokens (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    token_data JSONB,
    secret_ref VARCHAR(512),  -- Vault secret reference
    provider VARCHAR(50) NOT NULL,
    tenant_id VARCHAR(255),
    client_id VARCHAR(255),
    client_secret VARCHAR(255),
    subscription_name VARCHAR(255),
    subscription_id VARCHAR(255),
    email VARCHAR(255),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    session_data JSONB,
    last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true,
    UNIQUE(user_id, provider)
);
Providers:
  • gcp: Google Cloud Platform
  • aws: Amazon Web Services
  • azure: Microsoft Azure
  • github: GitHub
  • slack: Slack
  • grafana, datadog, pagerduty: Monitoring platforms
Security:
  • Sensitive data stored in Vault (referenced by secret_ref)
  • RLS enabled (users can only access their own tokens)

chat_sessions

Chat conversation history and state.
CREATE TABLE chat_sessions (
    id VARCHAR(50) PRIMARY KEY,
    user_id VARCHAR(1000) NOT NULL,
    title VARCHAR(255) NOT NULL,
    messages JSONB DEFAULT '[]'::jsonb,
    ui_state JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true,
    status VARCHAR(20) DEFAULT 'active',
    incident_id UUID
);
Message Format (JSONB):
[
  {
    "sender": "user",
    "text": "Deploy to GKE",
    "timestamp": "2024-03-01T10:00:00Z"
  },
  {
    "sender": "assistant",
    "text": "I'll help you deploy...",
    "timestamp": "2024-03-01T10:00:05Z",
    "tool_calls": [...]
  }
]
UI State (JSONB):
{
  "selectedModel": "anthropic/claude-3.5-sonnet",
  "selectedMode": "agent",
  "selectedProviders": ["gcp", "aws"]
}
Status Values:
  • active: Session in progress
  • completed: Session finished
  • cancelled: User cancelled
  • in_progress: Background task running

incidents

Incident tracking for RCA (Root Cause Analysis).
CREATE TABLE incidents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(255) NOT NULL,
    source_type VARCHAR(20) NOT NULL,
    source_alert_id INTEGER NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'investigating',
    severity VARCHAR(20),
    alert_title TEXT,
    alert_service TEXT,
    alert_environment TEXT,
    aurora_status VARCHAR(20) DEFAULT 'idle',
    aurora_summary TEXT,
    aurora_chat_session_id UUID,
    rca_celery_task_id VARCHAR(255),
    started_at TIMESTAMP NOT NULL,
    analyzed_at TIMESTAMP,
    slack_message_ts VARCHAR(50),
    active_tab VARCHAR(10) DEFAULT 'thoughts',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    merged_into_incident_id UUID REFERENCES incidents(id) ON DELETE SET NULL,
    UNIQUE(source_type, source_alert_id, user_id)
);

CREATE INDEX idx_incidents_user_id ON incidents(user_id, started_at DESC);
CREATE INDEX idx_incidents_status ON incidents(status);
CREATE INDEX idx_incidents_source ON incidents(source_type, source_alert_id);
CREATE INDEX idx_incidents_merged ON incidents(merged_into_incident_id) WHERE merged_into_incident_id IS NOT NULL;
Status Values:
  • investigating: Active investigation
  • analyzed: Analysis complete
  • resolved: Issue resolved
  • merged: Merged into another incident
Aurora Status:
  • idle: Not analyzing
  • running: Analysis in progress
  • complete: Analysis finished
  • error: Analysis failed
Source Types:
  • grafana, datadog, netdata, pagerduty, splunk, dynatrace, bigpanda, jenkins, cloudbees

incident_thoughts

Streaming analysis thoughts during RCA.
CREATE TABLE incident_thoughts (
    id SERIAL PRIMARY KEY,
    incident_id UUID REFERENCES incidents(id) ON DELETE CASCADE,
    timestamp TIMESTAMP NOT NULL,
    content TEXT NOT NULL,
    thought_type VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_thoughts_incident_id ON incident_thoughts(incident_id);
Thought Types:
  • analysis: Analytical reasoning
  • hypothesis: Proposed cause
  • evidence: Supporting data
  • conclusion: Final determination

incident_suggestions

Actionable suggestions from RCA.
CREATE TABLE incident_suggestions (
    id SERIAL PRIMARY KEY,
    incident_id UUID REFERENCES incidents(id) ON DELETE CASCADE,
    title TEXT NOT NULL,
    description TEXT,
    type VARCHAR(20),
    risk VARCHAR(20),
    command TEXT,
    -- Fields for fix-type suggestions (code changes)
    file_path TEXT,
    original_content TEXT,
    suggested_content TEXT,
    user_edited_content TEXT,
    repository TEXT,
    pr_url TEXT,
    pr_number INTEGER,
    created_branch TEXT,
    applied_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_suggestions_incident_id ON incident_suggestions(incident_id);
Suggestion Types:
  • diagnostic: Information gathering
  • fix: Code or configuration change
  • mitigation: Temporary fix
  • investigation: Further analysis needed
Risk Levels:
  • safe: No impact (read-only)
  • medium: May affect resources
  • high: Destructive or expensive

incident_citations

Tool execution results referenced in analysis.
CREATE TABLE incident_citations (
    id SERIAL PRIMARY KEY,
    incident_id UUID NOT NULL REFERENCES incidents(id) ON DELETE CASCADE,
    citation_key VARCHAR(10) NOT NULL,
    tool_name VARCHAR(255),
    command TEXT,
    output TEXT NOT NULL,
    executed_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(incident_id, citation_key)
);

CREATE INDEX idx_incident_citations_incident_id ON incident_citations(incident_id);
Example:
  • Citation key: [1]
  • Referenced in thoughts: “The pod is failing [1] due to resource limits”

incident_alerts

Correlated alerts merged into an incident.
CREATE TABLE incident_alerts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(1000) NOT NULL,
    incident_id UUID NOT NULL REFERENCES incidents(id) ON DELETE CASCADE,
    source_type VARCHAR(20) NOT NULL,
    source_alert_id INTEGER NOT NULL,
    alert_title TEXT,
    alert_service TEXT,
    alert_severity VARCHAR(20),
    correlation_strategy TEXT,
    correlation_score FLOAT,
    correlation_details JSONB,
    alert_metadata JSONB,
    received_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_alerts_incident_id ON incident_alerts(incident_id);
CREATE INDEX idx_incident_alerts_source ON incident_alerts(source_type, source_alert_id);
CREATE INDEX idx_incident_alerts_incident_received ON incident_alerts(incident_id, received_at);
Correlation Strategies:
  • time_window: Alerts within time window
  • service_match: Same service affected
  • semantic: Similar alert descriptions
  • manual: User-initiated merge

llm_usage_tracking

LLM API usage and cost tracking.
CREATE TABLE llm_usage_tracking (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(1000) NOT NULL,
    session_id VARCHAR(50),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    model_name VARCHAR(255) NOT NULL,
    api_provider VARCHAR(100) DEFAULT 'openrouter',
    request_type VARCHAR(100),
    input_tokens INTEGER NOT NULL DEFAULT 0,
    output_tokens INTEGER NOT NULL DEFAULT 0,
    total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens) STORED,
    estimated_cost DECIMAL(10,6) DEFAULT 0.00,
    surcharge_rate DECIMAL(5,4) DEFAULT 0.3000,
    surcharge_amount DECIMAL(10,6) GENERATED ALWAYS AS (estimated_cost * surcharge_rate) STORED,
    total_cost_with_surcharge DECIMAL(10,6) GENERATED ALWAYS AS (estimated_cost * (1 + surcharge_rate)) STORED,
    response_time_ms INTEGER,
    error_message TEXT,
    request_metadata JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Computed Columns:
  • total_tokens: Sum of input + output tokens
  • surcharge_amount: Platform markup (30% default)
  • total_cost_with_surcharge: Final cost to user

Kubernetes Tables

k8s_pods

CREATE TABLE k8s_pods (
    id SERIAL PRIMARY KEY,
    namespace VARCHAR(255) NOT NULL,
    pod_name VARCHAR(255) NOT NULL,
    status VARCHAR(50) NOT NULL,
    project_id VARCHAR(255) NOT NULL,
    cluster_name VARCHAR(255) NOT NULL,
    user_id VARCHAR(1000),
    provider VARCHAR(50),
    UNIQUE (pod_name, namespace, project_id, cluster_name, user_id)
);

k8s_nodes

CREATE TABLE k8s_nodes (
    id SERIAL PRIMARY KEY,
    node_name VARCHAR(255) NOT NULL,
    status VARCHAR(50),
    project_id VARCHAR(255) NOT NULL,
    cluster_name VARCHAR(255) NOT NULL,
    user_id VARCHAR(1000),
    provider VARCHAR(50),
    UNIQUE (node_name, project_id, cluster_name, user_id)
);

k8s_services, k8s_deployments, k8s_ingresses

Similar structure for other Kubernetes resources.

Monitoring Tables

grafana_alerts

CREATE TABLE grafana_alerts (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    alert_uid VARCHAR(255),
    alert_title TEXT,
    alert_state VARCHAR(50),
    rule_name TEXT,
    rule_url TEXT,
    dashboard_url TEXT,
    panel_url TEXT,
    payload JSONB NOT NULL,
    received_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_grafana_alerts_user_id ON grafana_alerts(user_id, received_at DESC);
CREATE INDEX idx_grafana_alerts_state ON grafana_alerts(alert_state);
CREATE INDEX idx_grafana_alerts_received_at ON grafana_alerts(received_at DESC);

datadog_events, pagerduty_events, netdata_alerts, splunk_alerts

Similar structure for other monitoring platforms.

Row-Level Security (RLS)

Aurora uses PostgreSQL RLS for multi-tenant data isolation:
-- Enable RLS on table
ALTER TABLE chat_sessions ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY user_isolation_policy ON chat_sessions
    USING (user_id = current_setting('myapp.current_user_id', TRUE)::TEXT);
Setting User Context:
SET myapp.current_user_id = 'user_abc123';
Protected Tables (RLS Enabled):
  • chat_sessions
  • incidents
  • incident_alerts
  • user_tokens
  • user_preferences
  • llm_usage_tracking
  • k8s_* (all Kubernetes tables)
  • *_alerts (all monitoring tables)

Migrations

Database schema is managed via initialize_tables() in server/utils/db/db_utils.py:
def initialize_tables():
    """Create tables and apply RLS policies."""
    with db_pool.get_admin_connection() as conn:
        cursor = conn.cursor()
        
        # Acquire advisory lock
        cursor.execute("SELECT pg_advisory_lock(1234567890);")
        
        # Create tables
        for table_name, create_script in create_tables.items():
            cursor.execute(create_script)
        
        # Apply RLS policies
        for table in rls_tables:
            cursor.execute(f"ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;")
Run Migrations:
# Automatically runs on server startup
python server/main_compute.py

Connection Pooling

Aurora uses a custom connection pool for efficient database access:
from utils.db.connection_pool import db_pool

# Get admin connection (full privileges)
with db_pool.get_admin_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")

# Get user connection (RLS enforced)
with db_pool.get_user_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SET myapp.current_user_id = %s", (user_id,))
    cursor.execute("SELECT * FROM chat_sessions")

Backup and Recovery

Backup

# Full database backup
pg_dump -h postgres -U postgres -d aurora_db -F c -f backup.dump

# Specific table backup
pg_dump -h postgres -U postgres -d aurora_db -t chat_sessions -F c -f chat_sessions.dump

Restore

# Restore full database
pg_restore -h postgres -U postgres -d aurora_db -c backup.dump

# Restore specific table
pg_restore -h postgres -U postgres -d aurora_db -t chat_sessions chat_sessions.dump

Performance Optimization

Indexes

Key indexes for performance:
-- User lookups
CREATE INDEX idx_users_email ON users(email);

-- Chat session queries
CREATE INDEX idx_chat_sessions_user_updated 
    ON chat_sessions(user_id, updated_at DESC);

-- Incident queries
CREATE INDEX idx_incidents_user_id 
    ON incidents(user_id, started_at DESC);
CREATE INDEX idx_incidents_status 
    ON incidents(status);

-- Monitoring alerts
CREATE INDEX idx_grafana_alerts_user_id 
    ON grafana_alerts(user_id, received_at DESC);

Query Optimization

  1. Use JSONB operators for efficient JSON queries:
    SELECT * FROM chat_sessions 
    WHERE ui_state->>'selectedMode' = 'agent';
    
  2. Limit result sets with pagination:
    SELECT * FROM incidents 
    WHERE user_id = 'user_123' 
    ORDER BY started_at DESC 
    LIMIT 50 OFFSET 0;
    
  3. Use partial indexes for filtered queries:
    CREATE INDEX idx_active_sessions 
    ON chat_sessions(user_id) 
    WHERE is_active = true;
    

Monitoring

Query Statistics

-- View slow queries
SELECT query, calls, total_time, mean_time 
FROM pg_stat_statements 
ORDER BY mean_time DESC 
LIMIT 10;

-- View table sizes
SELECT 
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables 
WHERE schemaname = 'public' 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Connection Monitoring

-- View active connections
SELECT * FROM pg_stat_activity 
WHERE datname = 'aurora_db';

-- Kill idle connections
SELECT pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE state = 'idle' AND state_change < NOW() - INTERVAL '1 hour';

Build docs developers (and LLMs) love