System Architecture

System Overview

Adapt is a web cache warming service built in Go, designed for Webflow sites and other web applications. It uses a worker pool architecture for efficient URL crawling and cache warming, with a focus on reliability, performance, and observability.

Core Components

Worker Pool System

The worker pool is the heart of Adapt’s concurrent processing system:

Concurrent Processing - Multiple workers process tasks simultaneously using PostgreSQL’s FOR UPDATE SKIP LOCKED
Job Management - Jobs are broken down into individual URL tasks and distributed across workers
Recovery System - Automatic recovery of stalled or failed tasks with exponential backoff
Task Monitoring - Real-time monitoring of task progress and status

Database Layer (PostgreSQL)

Adapt uses PostgreSQL with Supabase:

Normalised Schema - Separate tables for domains, pages, jobs, and tasks to reduce redundancy
Row-Level Locking - Uses FOR UPDATE SKIP LOCKED for efficient concurrent task acquisition
Connection Pooling - Optimised pool settings (45 max open, 18 max idle connections)
Data Integrity - Maintains job history, statistics, and task relationships

API Layer

RESTful API with standard practices:

RESTful Design - /v1/* endpoints with standardised responses and error handling
Authentication - JWT-based auth with Supabase Auth integration
Middleware Stack - CORS, logging, rate limiting, request tracking
Request IDs - Every request tracked with unique identifier

Crawler System

Efficient URL crawling and cache validation:

Concurrent URL Processing - Configurable concurrency with rate limiting
Cache Validation - Monitors cache status and performance metrics
Response Tracking - Records response times, status codes, and cache hits
Link Discovery - Optional extraction of additional URLs from crawled pages

Technical Concepts

Jobs and Tasks

Job

A collection of URLs from a single domain to be crawled

Contains metadata: domain, user/organisation, concurrency settings
Tracks progress: total/completed/failed task counts
Has lifecycle: pending → running → completed/cancelled

Task

Individual URL processing unit within a job

References a specific page within the job’s domain
Tracks execution: status, timing, response metrics, errors
Can be: pending → running → completed/failed/skipped

Worker

Process that executes tasks concurrently

Claims tasks atomically using database locking
Handles retries and error reporting
Updates task and job progress

Job Lifecycle

Job Creation

Validate domain and create domain/page records
Insert job with pending status
Optionally process sitemap or create root task

Job Start

Update status to running
Reset any stalled tasks from previous runs
Add job to worker pool for processing

Task Processing

Workers claim pending tasks atomically
Crawl URLs with retry logic and rate limiting
Store results and update task status
Update job progress counters

Job Completion

Automatic detection when all tasks finished
Calculate final statistics
Mark job as completed with timestamp

Recovery & Cleanup

Periodic cleanup of stuck jobs
Task recovery for server restarts
Failed task retry with exponential backoff

Codebase Structure

Architectural Principles

Adapt follows focused, testable function design:

Function Size - Functions kept under 50 lines where possible
Single Responsibility - Each function has one clear purpose
Testing - Strategic test coverage for critical paths and complex logic
Extract + Test + Commit - Proven methodology for safe refactoring

Package Organisation

cmd/
├── app/              # Main application entry point
└── test_jobs/        # Job queue testing utility

internal/
├── api/              # HTTP handlers and middleware
│   ├── handlers.go   # Route handlers
│   ├── auth.go       # JWT authentication
│   ├── jobs.go       # Job management (refactored)
│   └── response.go   # Standardised responses
├── db/               # Database operations
│   ├── db.go         # PostgreSQL connection (refactored)
│   ├── queue.go      # Queue operations
│   ├── pages.go      # Page/domain management
│   └── users.go      # User/organisation data
├── jobs/             # Job system
│   ├── manager.go    # Job lifecycle (refactored)
│   ├── worker.go     # Worker pool
│   └── types.go      # Job/task types
├── crawler/          # Web crawling
│   ├── crawler.go    # HTTP client (refactored)
│   ├── sitemap.go    # Sitemap parsing
│   └── config.go     # Crawler configuration
└── util/             # Shared utilities
    └── url.go        # URL normalisation

System Monitoring

Sentry Integration Strategy

Adapt uses Sentry for both error tracking and performance monitoring:

sentry.Init(sentry.ClientOptions{
    Dsn:              config.SentryDSN,
    Environment:      config.Env,
    TracesSampleRate: 0.1, // 10% in production, 100% in development
    AttachStacktrace: true,
    Debug:           config.Env == "development",
})

Critical Business Logic Failures:

Job creation, start, and cancellation failures
Worker startup failures and task status update failures
Transaction failures and stuck job cleanup failures
Database connection and server startup/shutdown failures

Performance Monitoring Spans:

manager.create_job, manager.start_job, manager.cancel_job - Job operations
manager.get_job, manager.get_job_status - Job queries
manager.process_sitemap - Sitemap processing
db.cleanup_stuck_jobs, db.create_page_records - Database operations

Health Monitoring

Database Health - Connection status and query performance
Worker Status - Active worker count and task processing rates
Job Progress - Real-time completion tracking and statistics
API Performance - Request timing and error rates

Frontend Integration

Template + Data Binding System

Adapt uses a template-based approach with attribute-based event handling:

<!-- Dashboard with attribute-based event handling -->
<div class="dashboard">
  <button bb-action="refresh-dashboard">↻ Refresh</button>
  <button bb-action="create-job">+ New Job</button>
  <div bb-action="view-job-details" bb-data-job-id="123">View Details</div>
</div>

<!-- Data binding for dynamic content -->
<div class="stats">
  <span data-bb-bind="stats.total_jobs">0</span>
  <div data-bb-template="job">
    <h4 data-bb-bind="domain">Loading...</h4>
    <div data-bb-bind-style="width:{progress}%"></div>
    <span data-bb-bind="status">pending</span>
  </div>
</div>

<!-- JavaScript handles bb-action attributes automatically -->
<script src="/js/bb-data-binder.min.js"></script>

Integration Benefits:

Users control all HTML structure and CSS styling
No CSS conflicts with existing designs
Works with any frontend framework (Webflow, custom sites)
Lightweight JavaScript library (~50KB)
Complete form handling with validation and authentication
Real-time data binding with template engine

Security & Authentication

JWT Authentication

Supabase Auth Integration - Validates JWT tokens from Supabase
User Context - Extracts user and organisation IDs from tokens
Protected Endpoints - Requires authentication for job operations
Row Level Security - PostgreSQL RLS policies for data isolation

Rate Limiting

IP-Based Limiting - Token bucket algorithm (5 requests/second default)
Client IP Detection - Supports X-Forwarded-For headers for proxies
Crawler Rate Limiting - Configurable delays between URL requests
Concurrency Controls - Per-job worker limits

Request Security

Input Validation - URL and parameter sanitisation
Error Sanitisation - Prevents information leakage
CORS Configuration - Controlled cross-origin access
Request Tracking - Unique request IDs for audit trails

Deployment Architecture

Infrastructure

Hosting - Fly.io with auto-scaling
Database - PostgreSQL with connection pooling (Supabase)
CDN - Cloudflare for caching and protection
Monitoring - Sentry (errors), Grafana Cloud (traces), Codecov (coverage)
Authentication - Supabase Auth with custom domain
Real-time - Supabase Realtime for live job progress updates
Storage - Supabase Storage (hot) + Cloudflare R2 (cold archive)

Data Storage Strategy

Hot Storage

Supabase Storage for recent and frequently accessed files:

Temporary assets
Crawler logs for active jobs
Recent HTML page captures for debugging
Fast, instant access for day-to-day operations

Cold Storage

Cloudflare R2 for long-term archival:

Historical data older than 30-90 days
Automated Go background job moves data from hot to cold
Significantly lower storage costs with no egress fees
Ideal for large volumes of infrequently accessed data

Performance Optimisation

Database Optimisations

Connection Pooling - 45 max open, 18 max idle connections
Query Optimisation - Indexed queries and efficient joins
Batch Operations - Reduce individual database calls
Lock-Free Task Claiming - FOR UPDATE SKIP LOCKED prevents contention

Crawler Optimisations

Concurrent Processing - Multiple workers process URLs simultaneously
Connection Reuse - HTTP client connection pooling
Rate Limiting - Prevents overwhelming target servers
Response Streaming - Efficient memory usage for large responses

Memory Management

Resource Cleanup - Proper goroutine and connection cleanup
Buffer Management - Controlled memory allocation
Garbage Collection - Optimised for low-latency operations

Supabase Integration

Real-time Features

Uses Postgres Changes subscriptions via Supabase Realtime: Implemented:

✅ Notification Badge - Real-time updates when jobs complete (v0.20.0)
- Postgres Changes subscription on notifications table
- WebSocket CSP configured for wss://adapt.auth.goodnative.co
- 200ms query delay to avoid transaction visibility race condition

Planned:

Live Job Progress - Postgres Changes on jobs table for instant updates
Dashboard Stats - Real-time totals without page refresh
Team Presence - Live indicators for multi-user organisations

Future Enhancements

Database Functions (Stage 5) - Move CPU-intensive queries to PostgreSQL functions
Edge Functions (Stage 6+) - Handle webhooks and scheduled jobs
File Storage (Stage 5) - Store crawler logs and screenshots
Enhanced RLS (Stage 6) - Replace Go auth middleware with database-level policies

Recent Refactoring Success

5 monster functions eliminated:

getJobTasks: 216 → 56 lines (74% reduction)
CreateJob: 232 → 42 lines (82% reduction)
setupJobURLDiscovery: 108 → 17 lines (84% reduction)
setupSchema: 216 → 27 lines (87% reduction)
WarmURL: 377 → 68 lines (82% reduction)

Results: 80% complexity reduction, 350+ tests created during refactoring

Contributing

System Architecture

System Overview

Core Components

Worker Pool System

Database Layer (PostgreSQL)

API Layer

Crawler System

Technical Concepts

Jobs and Tasks

Job

Task

Worker

Job Lifecycle

Codebase Structure

Architectural Principles

Package Organisation

System Monitoring

Sentry Integration Strategy

Health Monitoring

Frontend Integration

Template + Data Binding System

Security & Authentication

JWT Authentication

Rate Limiting

Request Security

Deployment Architecture

Infrastructure

Data Storage Strategy

Hot Storage

Cold Storage

Performance Optimisation

Database Optimisations

Crawler Optimisations

Memory Management

Supabase Integration

Real-time Features

Future Enhancements

Recent Refactoring Success

Build docs developers (and LLMs) love

Contributing

​System Overview

​Core Components

​Worker Pool System

​Database Layer (PostgreSQL)

​API Layer

​Crawler System

​Technical Concepts

​Jobs and Tasks

Job

Task

Worker

​Job Lifecycle

​Codebase Structure

​Architectural Principles

​Package Organisation

​System Monitoring

​Sentry Integration Strategy

​Health Monitoring

​Frontend Integration

​Template + Data Binding System

​Security & Authentication

​JWT Authentication

​Rate Limiting

​Request Security

​Deployment Architecture

​Infrastructure

​Data Storage Strategy

Hot Storage

Cold Storage

​Performance Optimisation

​Database Optimisations

​Crawler Optimisations

​Memory Management

​Supabase Integration

​Real-time Features

​Future Enhancements

​Recent Refactoring Success

Build docs developers (and LLMs) love

System Overview

Core Components

Worker Pool System

Database Layer (PostgreSQL)

API Layer

Crawler System

Technical Concepts

Jobs and Tasks

Job Lifecycle

Codebase Structure

Architectural Principles

Package Organisation

System Monitoring

Sentry Integration Strategy

Health Monitoring

Frontend Integration

Template + Data Binding System

Security & Authentication

JWT Authentication

Rate Limiting

Request Security

Deployment Architecture

Infrastructure

Data Storage Strategy

Performance Optimisation

Database Optimisations

Crawler Optimisations

Memory Management

Supabase Integration

Real-time Features

Future Enhancements

Recent Refactoring Success