Database Schema

Overview

Aurora uses PostgreSQL as its primary database with Row-Level Security (RLS) policies for multi-tenant data isolation. The schema is organized into several domains:

Authentication & Users: User accounts and preferences
Chat & Sessions: Chat history and session management
Incidents & RCA: Incident tracking and root cause analysis
Cloud Resources: Kubernetes, cloud provider data
Monitoring: Alerts from Grafana, Datadog, PagerDuty, etc.
Integrations: GitHub, Slack, Jenkins, etc.

Connection Details

Environment Variables:

POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_DB=aurora_db
POSTGRES_USER=postgres
POSTGRES_PASSWORD=yourpassword

Connection String:

postgresql://postgres:password@postgres:5432/aurora_db

Core Tables

users

User authentication and profile information.

CREATE TABLE users (
    id VARCHAR(255) PRIMARY KEY DEFAULT gen_random_uuid()::TEXT,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    name VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);

Security:

Passwords hashed with bcrypt
Email uniqueness enforced
No RLS (public table for authentication)

user_tokens

Cloud provider OAuth tokens and credentials.

CREATE TABLE user_tokens (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    token_data JSONB,
    secret_ref VARCHAR(512),  -- Vault secret reference
    provider VARCHAR(50) NOT NULL,
    tenant_id VARCHAR(255),
    client_id VARCHAR(255),
    client_secret VARCHAR(255),
    subscription_name VARCHAR(255),
    subscription_id VARCHAR(255),
    email VARCHAR(255),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    session_data JSONB,
    last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true,
    UNIQUE(user_id, provider)
);

Providers:

gcp: Google Cloud Platform
aws: Amazon Web Services
azure: Microsoft Azure
github: GitHub
slack: Slack
grafana, datadog, pagerduty: Monitoring platforms

Security:

Sensitive data stored in Vault (referenced by secret_ref)
RLS enabled (users can only access their own tokens)

chat_sessions

Chat conversation history and state.

CREATE TABLE chat_sessions (
    id VARCHAR(50) PRIMARY KEY,
    user_id VARCHAR(1000) NOT NULL,
    title VARCHAR(255) NOT NULL,
    messages JSONB DEFAULT '[]'::jsonb,
    ui_state JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true,
    status VARCHAR(20) DEFAULT 'active',
    incident_id UUID
);

Message Format (JSONB):

[
  {
    "sender": "user",
    "text": "Deploy to GKE",
    "timestamp": "2024-03-01T10:00:00Z"
  },
  {
    "sender": "assistant",
    "text": "I'll help you deploy...",
    "timestamp": "2024-03-01T10:00:05Z",
    "tool_calls": [...]
  }
]

UI State (JSONB):

{
  "selectedModel": "anthropic/claude-3.5-sonnet",
  "selectedMode": "agent",
  "selectedProviders": ["gcp", "aws"]
}

Status Values:

active: Session in progress
completed: Session finished
cancelled: User cancelled
in_progress: Background task running

incidents

Incident tracking for RCA (Root Cause Analysis).

CREATE TABLE incidents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(255) NOT NULL,
    source_type VARCHAR(20) NOT NULL,
    source_alert_id INTEGER NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'investigating',
    severity VARCHAR(20),
    alert_title TEXT,
    alert_service TEXT,
    alert_environment TEXT,
    aurora_status VARCHAR(20) DEFAULT 'idle',
    aurora_summary TEXT,
    aurora_chat_session_id UUID,
    rca_celery_task_id VARCHAR(255),
    started_at TIMESTAMP NOT NULL,
    analyzed_at TIMESTAMP,
    slack_message_ts VARCHAR(50),
    active_tab VARCHAR(10) DEFAULT 'thoughts',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    merged_into_incident_id UUID REFERENCES incidents(id) ON DELETE SET NULL,
    UNIQUE(source_type, source_alert_id, user_id)
);

CREATE INDEX idx_incidents_user_id ON incidents(user_id, started_at DESC);
CREATE INDEX idx_incidents_status ON incidents(status);
CREATE INDEX idx_incidents_source ON incidents(source_type, source_alert_id);
CREATE INDEX idx_incidents_merged ON incidents(merged_into_incident_id) WHERE merged_into_incident_id IS NOT NULL;

Status Values:

investigating: Active investigation
analyzed: Analysis complete
resolved: Issue resolved
merged: Merged into another incident

Aurora Status:

idle: Not analyzing
running: Analysis in progress
complete: Analysis finished
error: Analysis failed

Source Types:

grafana, datadog, netdata, pagerduty, splunk, dynatrace, bigpanda, jenkins, cloudbees

incident_thoughts

Streaming analysis thoughts during RCA.

CREATE TABLE incident_thoughts (
    id SERIAL PRIMARY KEY,
    incident_id UUID REFERENCES incidents(id) ON DELETE CASCADE,
    timestamp TIMESTAMP NOT NULL,
    content TEXT NOT NULL,
    thought_type VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_thoughts_incident_id ON incident_thoughts(incident_id);

Thought Types:

analysis: Analytical reasoning
hypothesis: Proposed cause
evidence: Supporting data
conclusion: Final determination

incident_suggestions

Actionable suggestions from RCA.

CREATE TABLE incident_suggestions (
    id SERIAL PRIMARY KEY,
    incident_id UUID REFERENCES incidents(id) ON DELETE CASCADE,
    title TEXT NOT NULL,
    description TEXT,
    type VARCHAR(20),
    risk VARCHAR(20),
    command TEXT,
    -- Fields for fix-type suggestions (code changes)
    file_path TEXT,
    original_content TEXT,
    suggested_content TEXT,
    user_edited_content TEXT,
    repository TEXT,
    pr_url TEXT,
    pr_number INTEGER,
    created_branch TEXT,
    applied_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_suggestions_incident_id ON incident_suggestions(incident_id);

Suggestion Types:

diagnostic: Information gathering
fix: Code or configuration change
mitigation: Temporary fix
investigation: Further analysis needed

Risk Levels:

safe: No impact (read-only)
medium: May affect resources
high: Destructive or expensive

incident_citations

Tool execution results referenced in analysis.

CREATE TABLE incident_citations (
    id SERIAL PRIMARY KEY,
    incident_id UUID NOT NULL REFERENCES incidents(id) ON DELETE CASCADE,
    citation_key VARCHAR(10) NOT NULL,
    tool_name VARCHAR(255),
    command TEXT,
    output TEXT NOT NULL,
    executed_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(incident_id, citation_key)
);

CREATE INDEX idx_incident_citations_incident_id ON incident_citations(incident_id);

Example:

Citation key: [1]
Referenced in thoughts: “The pod is failing [1] due to resource limits”

incident_alerts

Correlated alerts merged into an incident.

CREATE TABLE incident_alerts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(1000) NOT NULL,
    incident_id UUID NOT NULL REFERENCES incidents(id) ON DELETE CASCADE,
    source_type VARCHAR(20) NOT NULL,
    source_alert_id INTEGER NOT NULL,
    alert_title TEXT,
    alert_service TEXT,
    alert_severity VARCHAR(20),
    correlation_strategy TEXT,
    correlation_score FLOAT,
    correlation_details JSONB,
    alert_metadata JSONB,
    received_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_incident_alerts_incident_id ON incident_alerts(incident_id);
CREATE INDEX idx_incident_alerts_source ON incident_alerts(source_type, source_alert_id);
CREATE INDEX idx_incident_alerts_incident_received ON incident_alerts(incident_id, received_at);

Correlation Strategies:

time_window: Alerts within time window
service_match: Same service affected
semantic: Similar alert descriptions
manual: User-initiated merge

llm_usage_tracking

LLM API usage and cost tracking.

CREATE TABLE llm_usage_tracking (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(1000) NOT NULL,
    session_id VARCHAR(50),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    model_name VARCHAR(255) NOT NULL,
    api_provider VARCHAR(100) DEFAULT 'openrouter',
    request_type VARCHAR(100),
    input_tokens INTEGER NOT NULL DEFAULT 0,
    output_tokens INTEGER NOT NULL DEFAULT 0,
    total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens) STORED,
    estimated_cost DECIMAL(10,6) DEFAULT 0.00,
    surcharge_rate DECIMAL(5,4) DEFAULT 0.3000,
    surcharge_amount DECIMAL(10,6) GENERATED ALWAYS AS (estimated_cost * surcharge_rate) STORED,
    total_cost_with_surcharge DECIMAL(10,6) GENERATED ALWAYS AS (estimated_cost * (1 + surcharge_rate)) STORED,
    response_time_ms INTEGER,
    error_message TEXT,
    request_metadata JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Computed Columns:

total_tokens: Sum of input + output tokens
surcharge_amount: Platform markup (30% default)
total_cost_with_surcharge: Final cost to user

Kubernetes Tables

k8s_pods

CREATE TABLE k8s_pods (
    id SERIAL PRIMARY KEY,
    namespace VARCHAR(255) NOT NULL,
    pod_name VARCHAR(255) NOT NULL,
    status VARCHAR(50) NOT NULL,
    project_id VARCHAR(255) NOT NULL,
    cluster_name VARCHAR(255) NOT NULL,
    user_id VARCHAR(1000),
    provider VARCHAR(50),
    UNIQUE (pod_name, namespace, project_id, cluster_name, user_id)
);

k8s_nodes

CREATE TABLE k8s_nodes (
    id SERIAL PRIMARY KEY,
    node_name VARCHAR(255) NOT NULL,
    status VARCHAR(50),
    project_id VARCHAR(255) NOT NULL,
    cluster_name VARCHAR(255) NOT NULL,
    user_id VARCHAR(1000),
    provider VARCHAR(50),
    UNIQUE (node_name, project_id, cluster_name, user_id)
);

k8s_services, k8s_deployments, k8s_ingresses

Similar structure for other Kubernetes resources.

Monitoring Tables

grafana_alerts

CREATE TABLE grafana_alerts (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    alert_uid VARCHAR(255),
    alert_title TEXT,
    alert_state VARCHAR(50),
    rule_name TEXT,
    rule_url TEXT,
    dashboard_url TEXT,
    panel_url TEXT,
    payload JSONB NOT NULL,
    received_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_grafana_alerts_user_id ON grafana_alerts(user_id, received_at DESC);
CREATE INDEX idx_grafana_alerts_state ON grafana_alerts(alert_state);
CREATE INDEX idx_grafana_alerts_received_at ON grafana_alerts(received_at DESC);

datadog_events, pagerduty_events, netdata_alerts, splunk_alerts

Similar structure for other monitoring platforms.

Row-Level Security (RLS)

Aurora uses PostgreSQL RLS for multi-tenant data isolation:

-- Enable RLS on table
ALTER TABLE chat_sessions ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY user_isolation_policy ON chat_sessions
    USING (user_id = current_setting('myapp.current_user_id', TRUE)::TEXT);

Setting User Context:

SET myapp.current_user_id = 'user_abc123';

Protected Tables (RLS Enabled):

chat_sessions
incidents
incident_alerts
user_tokens
user_preferences
llm_usage_tracking
k8s_* (all Kubernetes tables)
*_alerts (all monitoring tables)

Migrations

Database schema is managed via initialize_tables() in server/utils/db/db_utils.py:

def initialize_tables():
    """Create tables and apply RLS policies."""
    with db_pool.get_admin_connection() as conn:
        cursor = conn.cursor()
        
        # Acquire advisory lock
        cursor.execute("SELECT pg_advisory_lock(1234567890);")
        
        # Create tables
        for table_name, create_script in create_tables.items():
            cursor.execute(create_script)
        
        # Apply RLS policies
        for table in rls_tables:
            cursor.execute(f"ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;")

Run Migrations:

# Automatically runs on server startup
python server/main_compute.py

Connection Pooling

Aurora uses a custom connection pool for efficient database access:

from utils.db.connection_pool import db_pool

# Get admin connection (full privileges)
with db_pool.get_admin_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")

# Get user connection (RLS enforced)
with db_pool.get_user_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SET myapp.current_user_id = %s", (user_id,))
    cursor.execute("SELECT * FROM chat_sessions")

Backup and Recovery

Backup

# Full database backup
pg_dump -h postgres -U postgres -d aurora_db -F c -f backup.dump

# Specific table backup
pg_dump -h postgres -U postgres -d aurora_db -t chat_sessions -F c -f chat_sessions.dump

Restore

# Restore full database
pg_restore -h postgres -U postgres -d aurora_db -c backup.dump

# Restore specific table
pg_restore -h postgres -U postgres -d aurora_db -t chat_sessions chat_sessions.dump

Performance Optimization

Indexes

Key indexes for performance:

-- User lookups
CREATE INDEX idx_users_email ON users(email);

-- Chat session queries
CREATE INDEX idx_chat_sessions_user_updated 
    ON chat_sessions(user_id, updated_at DESC);

-- Incident queries
CREATE INDEX idx_incidents_user_id 
    ON incidents(user_id, started_at DESC);
CREATE INDEX idx_incidents_status 
    ON incidents(status);

-- Monitoring alerts
CREATE INDEX idx_grafana_alerts_user_id 
    ON grafana_alerts(user_id, received_at DESC);

Query Optimization

Use JSONB operators for efficient JSON queries:

SELECT * FROM chat_sessions 
WHERE ui_state->>'selectedMode' = 'agent';

Limit result sets with pagination:

SELECT * FROM incidents 
WHERE user_id = 'user_123' 
ORDER BY started_at DESC 
LIMIT 50 OFFSET 0;

Use partial indexes for filtered queries:

CREATE INDEX idx_active_sessions 
ON chat_sessions(user_id) 
WHERE is_active = true;

Monitoring

Query Statistics

-- View slow queries
SELECT query, calls, total_time, mean_time 
FROM pg_stat_statements 
ORDER BY mean_time DESC 
LIMIT 10;

-- View table sizes
SELECT 
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables 
WHERE schemaname = 'public' 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Connection Monitoring

-- View active connections
SELECT * FROM pg_stat_activity 
WHERE datname = 'aurora_db';

-- Kill idle connections
SELECT pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE state = 'idle' AND state_change < NOW() - INTERVAL '1 hour';

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Overview

​Connection Details

​Core Tables

​users

​user_tokens

​chat_sessions

​incidents

​incident_thoughts

​incident_suggestions

​incident_citations

​incident_alerts

​llm_usage_tracking

​Kubernetes Tables

​k8s_pods

​k8s_nodes

​k8s_services, k8s_deployments, k8s_ingresses

​Monitoring Tables

​grafana_alerts

​datadog_events, pagerduty_events, netdata_alerts, splunk_alerts

​Row-Level Security (RLS)

​Migrations

​Connection Pooling

​Backup and Recovery

​Backup

​Restore

​Performance Optimization

​Indexes

​Query Optimization

​Monitoring

​Query Statistics

​Connection Monitoring

Build docs developers (and LLMs) love