Skip to main content

Overview

Adapt uses PostgreSQL as its primary database with a normalised schema designed for efficiency and data integrity. The system leverages PostgreSQL-specific features like FOR UPDATE SKIP LOCKED for lock-free concurrent task processing.
As of July 26, 2025, we manage database schema via migrations. Never run supabase db push manually - let GitHub integration handle it.

Connection Configuration

Environment Variables

# Recommended for production
DATABASE_URL="postgres://user:password@host:port/database?sslmode=require"

Connection Pool Settings

Optimised for high-concurrency workloads:
// Located in internal/db/db.go
client.SetMaxOpenConns(45)                      // Maximum open connections
client.SetMaxIdleConns(18)                      // Maximum idle connections
client.SetConnMaxLifetime(5 * time.Minute)      // Connection lifetime
client.SetConnMaxIdleTime(2 * time.Minute)      // Idle connection timeout

Connection Pool Sizing Strategy

Adapt uses conservative connection pool limits tuned for Supabase’s shared infrastructure: Current Configuration:
  • MaxOpenConns: 45 - Stays just below the Supabase pool limit of 48
  • MaxIdleConns: 18 - 40% idle buffer to keep ready connections without exhausting the pool
Environment-Based Tuning:
  • Production: 45 max open, 18 idle
  • Development: 15 max open, 5 idle (reduced for local testing)
General Formula (for future scaling):
  • Target MaxOpenConns ≤ ~90% × database_max_connections
  • Set MaxIdleConns to 30-40% of MaxOpenConns

Connection Timeout Configuration

// Located in internal/db/db.go - automatically appended to connection strings
statement_timeout=60000                      // 60 seconds - abort long queries
idle_in_transaction_session_timeout=30000    // 30 seconds - terminate idle transactions
Rationale:
  • statement_timeout - Prevents long-running queries from consuming resources indefinitely
  • idle_in_transaction_session_timeout - Prevents “zombie” transactions that hold locks

Database Schema

Core Tables

Domains Table

Stores unique domain names with integer primary keys for normalisation.
CREATE TABLE domains (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_domains_name ON domains(name);

Pages Table

Stores page paths with domain references to reduce redundancy.
CREATE TABLE pages (
    id SERIAL PRIMARY KEY,
    domain_id INTEGER REFERENCES domains(id),
    path TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(domain_id, path)
);

CREATE INDEX idx_pages_domain_path ON pages(domain_id, path);

Jobs Table

Stores job metadata and progress tracking.
CREATE TABLE jobs (
    id TEXT PRIMARY KEY,
    domain_id INTEGER REFERENCES domains(id),
    user_id TEXT,
    organisation_id TEXT,
    status TEXT NOT NULL,
    progress REAL DEFAULT 0.0,
    total_tasks INTEGER DEFAULT 0,
    completed_tasks INTEGER DEFAULT 0,
    failed_tasks INTEGER DEFAULT 0,
    skipped_tasks INTEGER DEFAULT 0,
    found_tasks INTEGER DEFAULT 0,
    sitemap_tasks INTEGER DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    started_at TIMESTAMP WITH TIME ZONE,
    completed_at TIMESTAMP WITH TIME ZONE,
    concurrency INTEGER DEFAULT 1,
    find_links BOOLEAN DEFAULT FALSE,
    max_pages INTEGER DEFAULT 100,
    include_paths TEXT,
    exclude_paths TEXT,
    required_workers INTEGER DEFAULT 1
);

-- Indexes for performance
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_user_org ON jobs(user_id, organisation_id);
CREATE INDEX idx_jobs_created_at ON jobs(created_at);
Status Values:
  • pending - Job created but not started
  • running - Job actively processing tasks
  • completed - All tasks finished successfully
  • cancelled - Job manually cancelled
  • failed - Job failed due to system error
Task Counters:
  • total_tasks = sitemap_tasks + found_tasks
  • sitemap_tasks - URLs from sitemap processing
  • found_tasks - URLs discovered through link crawling
  • completed_tasks - Successfully processed URLs
  • failed_tasks - URLs that failed processing
  • skipped_tasks - URLs skipped due to limits or filters

Tasks Table

Stores individual URL processing tasks.
CREATE TABLE tasks (
    id TEXT PRIMARY KEY,
    job_id TEXT REFERENCES jobs(id),
    domain_id INTEGER REFERENCES domains(id),
    page_id INTEGER REFERENCES pages(id),
    status TEXT NOT NULL,
    source_type TEXT,
    source_url TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    started_at TIMESTAMP WITH TIME ZONE,
    completed_at TIMESTAMP WITH TIME ZONE,
    status_code INTEGER,
    response_time INTEGER,
    cache_status TEXT,
    content_type TEXT,
    error TEXT,
    retry_count INTEGER DEFAULT 0
);

-- Indexes for task processing
CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_job_id ON tasks(job_id);
CREATE INDEX idx_tasks_job_status ON tasks(job_id, status);
CREATE INDEX idx_tasks_pending ON tasks(created_at) WHERE status = 'pending';
Status Values:
  • pending - Task waiting to be processed
  • running - Task currently being processed
  • completed - Task successfully completed
  • failed - Task failed after retries
  • skipped - Task skipped due to limits

Scheduler Tables

Schedulers Table

Manages recurring job schedules for automatic cache warming.
CREATE TABLE schedulers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    domain_id INTEGER NOT NULL REFERENCES domains(id),
    organisation_id UUID NOT NULL REFERENCES organisations(id),
    schedule_interval_hours INTEGER NOT NULL CHECK (schedule_interval_hours IN (6, 12, 24, 48)),
    next_run_at TIMESTAMPTZ NOT NULL,
    is_enabled BOOLEAN NOT NULL DEFAULT TRUE,
    
    -- Job configuration template
    concurrency INTEGER NOT NULL DEFAULT 20,
    find_links BOOLEAN NOT NULL DEFAULT TRUE,
    max_pages INTEGER NOT NULL DEFAULT 0,
    include_paths TEXT,
    exclude_paths TEXT,
    required_workers INTEGER NOT NULL DEFAULT 1,
    
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    CONSTRAINT unique_domain_org UNIQUE(domain_id, organisation_id)
);

CREATE INDEX idx_schedulers_next_run ON schedulers(next_run_at) WHERE is_enabled = TRUE;
CREATE INDEX idx_schedulers_organisation ON schedulers(organisation_id);
Key Features:
  • Interval Constraints - Only allows 6, 12, 24, or 48-hour intervals
  • Automatic Execution - Background service polls next_run_at every 30 seconds
  • Job Templates - Stores full job configuration for automatic creation
  • Organisation Isolation - One active scheduler per domain per organisation

Authentication Tables

Users Table

Extends Supabase auth.users with application-specific data.
CREATE TABLE users (
    id TEXT PRIMARY KEY, -- Matches Supabase auth.users(id)
    email TEXT NOT NULL UNIQUE,
    full_name TEXT,
    organisation_id TEXT REFERENCES organisations(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_organisation ON users(organisation_id);

Organisations Table

CREATE TABLE organisations (
    id TEXT PRIMARY KEY DEFAULT gen_random_uuid()::text,
    name TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

PostgreSQL-Specific Features

Lock-Free Task Processing

Uses FOR UPDATE SKIP LOCKED to allow multiple workers to claim tasks without blocking:
-- Task acquisition query (internal/db/queue.go)
SELECT t.id, t.job_id, d.name as domain, p.path, t.source_type, t.source_url
FROM tasks t
JOIN pages p ON t.page_id = p.id
JOIN domains d ON p.domain_id = d.id
WHERE t.job_id = ANY($1)
  AND t.status = 'pending'
ORDER BY t.created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;

Batch Operations

Efficient bulk inserts using PostgreSQL arrays:
-- Batch task creation (internal/db/queue.go)
INSERT INTO tasks (id, job_id, domain_id, page_id, status, source_type, source_url, created_at)
SELECT
    gen_random_uuid()::text,
    $1,
    $2,
    unnest($3::integer[]),
    'pending',
    $4,
    $5,
    NOW()

Progress Tracking

Atomic progress updates with conditional status changes:
-- Job progress update (internal/db/queue.go)
UPDATE jobs
SET
    progress = CASE
        WHEN total_tasks > 0 THEN
            ((completed_tasks + failed_tasks)::REAL / total_tasks::REAL) * 100.0
        ELSE 0.0
    END,
    completed_tasks = (
        SELECT COUNT(*) FROM tasks
        WHERE job_id = $1 AND status = 'completed'
    ),
    failed_tasks = (
        SELECT COUNT(*) FROM tasks
        WHERE job_id = $1 AND status = 'failed'
    )
WHERE id = $1;

Performance Optimisation

Composite Indexes for Hot Paths

Adapt uses composite indexes optimised for actual query patterns:

Task Claiming (Worker Pool)

Index: idx_tasks_claim_optimised
CREATE INDEX CONCURRENTLY idx_tasks_claim_optimised
ON tasks(status, job_id, priority_score DESC, created_at ASC)
WHERE status = 'pending';
Why Composite:
  • Eliminates “Incremental Sort” step (was sorting ~777 rows per claim)
  • Index already sorted by priority_score DESC, created_at ASC
  • 50-70% latency reduction on task claiming
  • Partial index (WHERE status = ‘pending’) keeps index size small

Dashboard Job Listing

Indexes: idx_jobs_org_status_created and idx_jobs_org_created
-- For queries WITH status filter
CREATE INDEX CONCURRENTLY idx_jobs_org_status_created
ON jobs(organisation_id, status, created_at DESC);

-- For queries WITHOUT status filter
CREATE INDEX CONCURRENTLY idx_jobs_org_created
ON jobs(organisation_id, created_at DESC);
Why Two Indexes:
  • PostgreSQL can’t always use multi-column indexes when middle columns are omitted
  • Each index is only ~100KB but provides 90%+ improvement (11ms → <1ms)
  • Eliminated sequential scans that were reading 5899 buffers for 164 rows

Monitoring Queries

-- Check worker efficiency
SELECT
    status,
    COUNT(*) as count,
    AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_duration
FROM tasks
WHERE started_at > NOW() - INTERVAL '1 hour'
GROUP BY status;

-- Monitor job completion rates
SELECT
    DATE_TRUNC('hour', created_at) as hour,
    COUNT(*) as jobs_created,
    COUNT(completed_at) as jobs_completed
FROM jobs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;

-- Check database performance
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch
FROM pg_stat_user_tables
WHERE schemaname = 'public';

Migration Management

Creating Migrations

1

Generate migration file

supabase migration new your_migration_name
This creates a timestamped file in supabase/migrations/
2

Write migration SQL

-- Add new columns safely
ALTER TABLE jobs
ADD COLUMN IF NOT EXISTS new_field TEXT DEFAULT '';

-- Create indexes concurrently (non-blocking)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_new_field
ON jobs(new_field);

-- Remove old columns after confirming unused
ALTER TABLE jobs DROP COLUMN IF EXISTS old_field;
3

Test locally (optional)

supabase start
supabase db reset  # Applies all migrations
4

Deploy via GitHub

  • Push to feature branch
  • Create PR to test-branch (migrations auto-apply)
  • After testing, merge to main (migrations auto-apply)

Migration Files

All migrations are in supabase/migrations/:
  • 20240101000000_initial_schema.sql - Base schema creation
  • 20250720013915_remote_schema.sql - Initial remote state
  • 20250727212804_add_job_duration_fields.sql - Calculated duration fields
  • New migrations get timestamped names automatically

Seed Data for Preview Branches

The supabase/seed.sql file provides test data for Supabase preview branches. When seed runs:
  • Only on branch creation - when a PR is first opened
  • Does NOT re-run on subsequent pushes to the same PR
  • To re-seed: close the PR and reopen, or reset the branch via Supabase dashboard/API
What’s seeded:
  • auth.users and auth.identities - test user accounts (Google OAuth)
  • organisations, users, domains, pages - core entities
  • jobs, tasks - job history for dashboard testing
  • schedulers - scheduled jobs

Row Level Security (RLS)

User Data Isolation

-- Enable RLS on user tables
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE jobs ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;

-- Users can only access their own data
CREATE POLICY "users_own_data" ON users
FOR ALL USING (auth.uid()::text = id);

-- Organisation members can access shared jobs
CREATE POLICY "org_jobs_access" ON jobs
FOR ALL USING (
    organisation_id IN (
        SELECT organisation_id FROM users
        WHERE id = auth.uid()::text
    )
);

-- Tasks inherit job access permissions
CREATE POLICY "job_tasks_access" ON tasks
FOR ALL USING (
    job_id IN (
        SELECT id FROM jobs
        WHERE organisation_id IN (
            SELECT organisation_id FROM users
            WHERE id = auth.uid()::text
        )
    )
);

Performance Observability

Adapt ships with the pg_stat_statements extension enabled:
  • The migration 20251012070000_enable_pg_stat_statements.sql enables the extension
  • View at observability.pg_stat_statements_top_total_time lists top 50 statements by total execution time
  • Query it via the Supabase SQL Editor using a service key
Recommended workflow:
-- Review top queries
SELECT * FROM observability.pg_stat_statements_top_total_time LIMIT 20;

-- Optional: Reset statistics after exporting for analysis
SELECT pg_stat_statements_reset();

Backup & Recovery

Backup Strategy

# Full database backup
pg_dump -h host -U user -d database > backup_$(date +%Y%m%d_%H%M%S).sql

# Schema-only backup
pg_dump -h host -U user -d database --schema-only > schema_backup.sql

# Data-only backup
pg_dump -h host -U user -d database --data-only > data_backup.sql

Recovery Testing

# Restore from backup
psql -h host -U user -d new_database < backup_file.sql

# Verify data integrity
SELECT COUNT(*) FROM jobs;
SELECT COUNT(*) FROM tasks;
SELECT COUNT(*) FROM users;

Build docs developers (and LLMs) love