Database Schema & Migrations

Overview

Adapt uses PostgreSQL as its primary database with a normalised schema designed for efficiency and data integrity. The system leverages PostgreSQL-specific features like FOR UPDATE SKIP LOCKED for lock-free concurrent task processing.

As of July 26, 2025, we manage database schema via migrations. Never run supabase db push manually - let GitHub integration handle it.

Connection Configuration

Environment Variables

# Recommended for production
DATABASE_URL="postgres://user:password@host:port/database?sslmode=require"

Connection Pool Settings

Optimised for high-concurrency workloads:

// Located in internal/db/db.go
client.SetMaxOpenConns(45)                      // Maximum open connections
client.SetMaxIdleConns(18)                      // Maximum idle connections
client.SetConnMaxLifetime(5 * time.Minute)      // Connection lifetime
client.SetConnMaxIdleTime(2 * time.Minute)      // Idle connection timeout

Connection Pool Sizing Strategy

Adapt uses conservative connection pool limits tuned for Supabase’s shared infrastructure: Current Configuration:

MaxOpenConns: 45 - Stays just below the Supabase pool limit of 48
MaxIdleConns: 18 - 40% idle buffer to keep ready connections without exhausting the pool

Environment-Based Tuning:

Production: 45 max open, 18 idle
Development: 15 max open, 5 idle (reduced for local testing)

General Formula (for future scaling):

Target MaxOpenConns ≤ ~90% × database_max_connections
Set MaxIdleConns to 30-40% of MaxOpenConns

Connection Timeout Configuration

// Located in internal/db/db.go - automatically appended to connection strings
statement_timeout=60000                      // 60 seconds - abort long queries
idle_in_transaction_session_timeout=30000    // 30 seconds - terminate idle transactions

Rationale:

statement_timeout - Prevents long-running queries from consuming resources indefinitely
idle_in_transaction_session_timeout - Prevents “zombie” transactions that hold locks

Database Schema

Core Tables

Domains Table

Stores unique domain names with integer primary keys for normalisation.

CREATE TABLE domains (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_domains_name ON domains(name);

Pages Table

Stores page paths with domain references to reduce redundancy.

CREATE TABLE pages (
    id SERIAL PRIMARY KEY,
    domain_id INTEGER REFERENCES domains(id),
    path TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(domain_id, path)
);

CREATE INDEX idx_pages_domain_path ON pages(domain_id, path);

Jobs Table

Stores job metadata and progress tracking.

CREATE TABLE jobs (
    id TEXT PRIMARY KEY,
    domain_id INTEGER REFERENCES domains(id),
    user_id TEXT,
    organisation_id TEXT,
    status TEXT NOT NULL,
    progress REAL DEFAULT 0.0,
    total_tasks INTEGER DEFAULT 0,
    completed_tasks INTEGER DEFAULT 0,
    failed_tasks INTEGER DEFAULT 0,
    skipped_tasks INTEGER DEFAULT 0,
    found_tasks INTEGER DEFAULT 0,
    sitemap_tasks INTEGER DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    started_at TIMESTAMP WITH TIME ZONE,
    completed_at TIMESTAMP WITH TIME ZONE,
    concurrency INTEGER DEFAULT 1,
    find_links BOOLEAN DEFAULT FALSE,
    max_pages INTEGER DEFAULT 100,
    include_paths TEXT,
    exclude_paths TEXT,
    required_workers INTEGER DEFAULT 1
);

-- Indexes for performance
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_user_org ON jobs(user_id, organisation_id);
CREATE INDEX idx_jobs_created_at ON jobs(created_at);

Status Values:

pending - Job created but not started
running - Job actively processing tasks
completed - All tasks finished successfully
cancelled - Job manually cancelled
failed - Job failed due to system error

Task Counters:

total_tasks = sitemap_tasks + found_tasks
sitemap_tasks - URLs from sitemap processing
found_tasks - URLs discovered through link crawling
completed_tasks - Successfully processed URLs
failed_tasks - URLs that failed processing
skipped_tasks - URLs skipped due to limits or filters

Tasks Table

Stores individual URL processing tasks.

CREATE TABLE tasks (
    id TEXT PRIMARY KEY,
    job_id TEXT REFERENCES jobs(id),
    domain_id INTEGER REFERENCES domains(id),
    page_id INTEGER REFERENCES pages(id),
    status TEXT NOT NULL,
    source_type TEXT,
    source_url TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    started_at TIMESTAMP WITH TIME ZONE,
    completed_at TIMESTAMP WITH TIME ZONE,
    status_code INTEGER,
    response_time INTEGER,
    cache_status TEXT,
    content_type TEXT,
    error TEXT,
    retry_count INTEGER DEFAULT 0
);

-- Indexes for task processing
CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_job_id ON tasks(job_id);
CREATE INDEX idx_tasks_job_status ON tasks(job_id, status);
CREATE INDEX idx_tasks_pending ON tasks(created_at) WHERE status = 'pending';

Status Values:

pending - Task waiting to be processed
running - Task currently being processed
completed - Task successfully completed
failed - Task failed after retries
skipped - Task skipped due to limits

Scheduler Tables

Schedulers Table

Manages recurring job schedules for automatic cache warming.

CREATE TABLE schedulers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    domain_id INTEGER NOT NULL REFERENCES domains(id),
    organisation_id UUID NOT NULL REFERENCES organisations(id),
    schedule_interval_hours INTEGER NOT NULL CHECK (schedule_interval_hours IN (6, 12, 24, 48)),
    next_run_at TIMESTAMPTZ NOT NULL,
    is_enabled BOOLEAN NOT NULL DEFAULT TRUE,
    
    -- Job configuration template
    concurrency INTEGER NOT NULL DEFAULT 20,
    find_links BOOLEAN NOT NULL DEFAULT TRUE,
    max_pages INTEGER NOT NULL DEFAULT 0,
    include_paths TEXT,
    exclude_paths TEXT,
    required_workers INTEGER NOT NULL DEFAULT 1,
    
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    CONSTRAINT unique_domain_org UNIQUE(domain_id, organisation_id)
);

CREATE INDEX idx_schedulers_next_run ON schedulers(next_run_at) WHERE is_enabled = TRUE;
CREATE INDEX idx_schedulers_organisation ON schedulers(organisation_id);

Key Features:

Interval Constraints - Only allows 6, 12, 24, or 48-hour intervals
Automatic Execution - Background service polls next_run_at every 30 seconds
Job Templates - Stores full job configuration for automatic creation
Organisation Isolation - One active scheduler per domain per organisation

Authentication Tables

Users Table

Extends Supabase auth.users with application-specific data.

CREATE TABLE users (
    id TEXT PRIMARY KEY, -- Matches Supabase auth.users(id)
    email TEXT NOT NULL UNIQUE,
    full_name TEXT,
    organisation_id TEXT REFERENCES organisations(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_organisation ON users(organisation_id);

Organisations Table

CREATE TABLE organisations (
    id TEXT PRIMARY KEY DEFAULT gen_random_uuid()::text,
    name TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

PostgreSQL-Specific Features

Lock-Free Task Processing

Uses FOR UPDATE SKIP LOCKED to allow multiple workers to claim tasks without blocking:

-- Task acquisition query (internal/db/queue.go)
SELECT t.id, t.job_id, d.name as domain, p.path, t.source_type, t.source_url
FROM tasks t
JOIN pages p ON t.page_id = p.id
JOIN domains d ON p.domain_id = d.id
WHERE t.job_id = ANY($1)
  AND t.status = 'pending'
ORDER BY t.created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;

Batch Operations

Efficient bulk inserts using PostgreSQL arrays:

-- Batch task creation (internal/db/queue.go)
INSERT INTO tasks (id, job_id, domain_id, page_id, status, source_type, source_url, created_at)
SELECT
    gen_random_uuid()::text,
    $1,
    $2,
    unnest($3::integer[]),
    'pending',
    $4,
    $5,
    NOW()

Progress Tracking

Atomic progress updates with conditional status changes:

-- Job progress update (internal/db/queue.go)
UPDATE jobs
SET
    progress = CASE
        WHEN total_tasks > 0 THEN
            ((completed_tasks + failed_tasks)::REAL / total_tasks::REAL) * 100.0
        ELSE 0.0
    END,
    completed_tasks = (
        SELECT COUNT(*) FROM tasks
        WHERE job_id = $1 AND status = 'completed'
    ),
    failed_tasks = (
        SELECT COUNT(*) FROM tasks
        WHERE job_id = $1 AND status = 'failed'
    )
WHERE id = $1;

Performance Optimisation

Composite Indexes for Hot Paths

Adapt uses composite indexes optimised for actual query patterns:

Task Claiming (Worker Pool)

Index: idx_tasks_claim_optimised

CREATE INDEX CONCURRENTLY idx_tasks_claim_optimised
ON tasks(status, job_id, priority_score DESC, created_at ASC)
WHERE status = 'pending';

Why Composite:

Eliminates “Incremental Sort” step (was sorting ~777 rows per claim)
Index already sorted by priority_score DESC, created_at ASC
50-70% latency reduction on task claiming
Partial index (WHERE status = ‘pending’) keeps index size small

Dashboard Job Listing

Indexes: idx_jobs_org_status_created and idx_jobs_org_created

-- For queries WITH status filter
CREATE INDEX CONCURRENTLY idx_jobs_org_status_created
ON jobs(organisation_id, status, created_at DESC);

-- For queries WITHOUT status filter
CREATE INDEX CONCURRENTLY idx_jobs_org_created
ON jobs(organisation_id, created_at DESC);

Why Two Indexes:

PostgreSQL can’t always use multi-column indexes when middle columns are omitted
Each index is only ~100KB but provides 90%+ improvement (11ms → <1ms)
Eliminated sequential scans that were reading 5899 buffers for 164 rows

Monitoring Queries

-- Check worker efficiency
SELECT
    status,
    COUNT(*) as count,
    AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_duration
FROM tasks
WHERE started_at > NOW() - INTERVAL '1 hour'
GROUP BY status;

-- Monitor job completion rates
SELECT
    DATE_TRUNC('hour', created_at) as hour,
    COUNT(*) as jobs_created,
    COUNT(completed_at) as jobs_completed
FROM jobs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;

-- Check database performance
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch
FROM pg_stat_user_tables
WHERE schemaname = 'public';

Migration Management

Creating Migrations

Generate migration file

supabase migration new your_migration_name

This creates a timestamped file in supabase/migrations/

Write migration SQL

-- Add new columns safely
ALTER TABLE jobs
ADD COLUMN IF NOT EXISTS new_field TEXT DEFAULT '';

-- Create indexes concurrently (non-blocking)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_new_field
ON jobs(new_field);

-- Remove old columns after confirming unused
ALTER TABLE jobs DROP COLUMN IF EXISTS old_field;

Test locally (optional)

supabase start
supabase db reset  # Applies all migrations

Deploy via GitHub

Push to feature branch
Create PR to test-branch (migrations auto-apply)
After testing, merge to main (migrations auto-apply)

Migration Files

All migrations are in supabase/migrations/:

20240101000000_initial_schema.sql - Base schema creation
20250720013915_remote_schema.sql - Initial remote state
20250727212804_add_job_duration_fields.sql - Calculated duration fields
New migrations get timestamped names automatically

Seed Data for Preview Branches

The supabase/seed.sql file provides test data for Supabase preview branches. When seed runs:

Only on branch creation - when a PR is first opened
Does NOT re-run on subsequent pushes to the same PR
To re-seed: close the PR and reopen, or reset the branch via Supabase dashboard/API

What’s seeded:

auth.users and auth.identities - test user accounts (Google OAuth)
organisations, users, domains, pages - core entities
jobs, tasks - job history for dashboard testing
schedulers - scheduled jobs

Row Level Security (RLS)

User Data Isolation

-- Enable RLS on user tables
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE jobs ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;

-- Users can only access their own data
CREATE POLICY "users_own_data" ON users
FOR ALL USING (auth.uid()::text = id);

-- Organisation members can access shared jobs
CREATE POLICY "org_jobs_access" ON jobs
FOR ALL USING (
    organisation_id IN (
        SELECT organisation_id FROM users
        WHERE id = auth.uid()::text
    )
);

-- Tasks inherit job access permissions
CREATE POLICY "job_tasks_access" ON tasks
FOR ALL USING (
    job_id IN (
        SELECT id FROM jobs
        WHERE organisation_id IN (
            SELECT organisation_id FROM users
            WHERE id = auth.uid()::text
        )
    )
);

Performance Observability

Adapt ships with the pg_stat_statements extension enabled:

The migration 20251012070000_enable_pg_stat_statements.sql enables the extension
View at observability.pg_stat_statements_top_total_time lists top 50 statements by total execution time
Query it via the Supabase SQL Editor using a service key

Recommended workflow:

-- Review top queries
SELECT * FROM observability.pg_stat_statements_top_total_time LIMIT 20;

-- Optional: Reset statistics after exporting for analysis
SELECT pg_stat_statements_reset();

Backup & Recovery

Backup Strategy

# Full database backup
pg_dump -h host -U user -d database > backup_$(date +%Y%m%d_%H%M%S).sql

# Schema-only backup
pg_dump -h host -U user -d database --schema-only > schema_backup.sql

# Data-only backup
pg_dump -h host -U user -d database --data-only > data_backup.sql

Recovery Testing

# Restore from backup
psql -h host -U user -d new_database < backup_file.sql

# Verify data integrity
SELECT COUNT(*) FROM jobs;
SELECT COUNT(*) FROM tasks;
SELECT COUNT(*) FROM users;

Contributing

Database Schema & Migrations

Overview

Connection Configuration

Environment Variables

Connection Pool Settings

Connection Pool Sizing Strategy

Connection Timeout Configuration

Database Schema

Core Tables

Domains Table

Pages Table

Jobs Table

Tasks Table

Scheduler Tables

Schedulers Table

Authentication Tables

Users Table

Organisations Table

PostgreSQL-Specific Features

Lock-Free Task Processing

Batch Operations

Progress Tracking

Performance Optimisation

Composite Indexes for Hot Paths

Task Claiming (Worker Pool)

Dashboard Job Listing

Monitoring Queries

Migration Management

Creating Migrations

Migration Files

Seed Data for Preview Branches

Row Level Security (RLS)

User Data Isolation

Performance Observability

Backup & Recovery

Backup Strategy

Recovery Testing

Build docs developers (and LLMs) love

Contributing

​Overview

​Connection Configuration

​Environment Variables

​Connection Pool Settings

​Connection Pool Sizing Strategy

​Connection Timeout Configuration

​Database Schema

​Core Tables

​Domains Table

​Pages Table

​Jobs Table

​Tasks Table

​Scheduler Tables

​Schedulers Table

​Authentication Tables

​Users Table

​Organisations Table

​PostgreSQL-Specific Features

​Lock-Free Task Processing

​Batch Operations

​Progress Tracking

​Performance Optimisation

​Composite Indexes for Hot Paths

​Task Claiming (Worker Pool)

​Dashboard Job Listing

​Monitoring Queries

​Migration Management

​Creating Migrations

​Migration Files

​Seed Data for Preview Branches

​Row Level Security (RLS)

​User Data Isolation

​Performance Observability

​Backup & Recovery

​Backup Strategy

​Recovery Testing

Build docs developers (and LLMs) love

Overview

Connection Configuration

Environment Variables

Connection Pool Settings

Connection Pool Sizing Strategy

Connection Timeout Configuration

Database Schema

Core Tables

Domains Table

Pages Table

Jobs Table

Tasks Table

Scheduler Tables

Schedulers Table

Authentication Tables

Users Table

Organisations Table

PostgreSQL-Specific Features

Lock-Free Task Processing

Batch Operations

Progress Tracking

Performance Optimisation

Composite Indexes for Hot Paths

Task Claiming (Worker Pool)

Dashboard Job Listing

Monitoring Queries

Migration Management

Creating Migrations

Migration Files

Seed Data for Preview Branches

Row Level Security (RLS)

User Data Isolation

Performance Observability

Backup & Recovery

Backup Strategy

Recovery Testing