Skip to main content

System Overview

Adapt is a web cache warming service built in Go, designed for Webflow sites and other web applications. It uses a worker pool architecture for efficient URL crawling and cache warming, with a focus on reliability, performance, and observability.

Core Components

Worker Pool System

The worker pool is the heart of Adapt’s concurrent processing system:
  • Concurrent Processing - Multiple workers process tasks simultaneously using PostgreSQL’s FOR UPDATE SKIP LOCKED
  • Job Management - Jobs are broken down into individual URL tasks and distributed across workers
  • Recovery System - Automatic recovery of stalled or failed tasks with exponential backoff
  • Task Monitoring - Real-time monitoring of task progress and status

Database Layer (PostgreSQL)

Adapt uses PostgreSQL with Supabase:
  • Normalised Schema - Separate tables for domains, pages, jobs, and tasks to reduce redundancy
  • Row-Level Locking - Uses FOR UPDATE SKIP LOCKED for efficient concurrent task acquisition
  • Connection Pooling - Optimised pool settings (45 max open, 18 max idle connections)
  • Data Integrity - Maintains job history, statistics, and task relationships

API Layer

RESTful API with standard practices:
  • RESTful Design - /v1/* endpoints with standardised responses and error handling
  • Authentication - JWT-based auth with Supabase Auth integration
  • Middleware Stack - CORS, logging, rate limiting, request tracking
  • Request IDs - Every request tracked with unique identifier

Crawler System

Efficient URL crawling and cache validation:
  • Concurrent URL Processing - Configurable concurrency with rate limiting
  • Cache Validation - Monitors cache status and performance metrics
  • Response Tracking - Records response times, status codes, and cache hits
  • Link Discovery - Optional extraction of additional URLs from crawled pages

Technical Concepts

Jobs and Tasks

Job

A collection of URLs from a single domain to be crawled
  • Contains metadata: domain, user/organisation, concurrency settings
  • Tracks progress: total/completed/failed task counts
  • Has lifecycle: pending → running → completed/cancelled

Task

Individual URL processing unit within a job
  • References a specific page within the job’s domain
  • Tracks execution: status, timing, response metrics, errors
  • Can be: pending → running → completed/failed/skipped

Worker

Process that executes tasks concurrently
  • Claims tasks atomically using database locking
  • Handles retries and error reporting
  • Updates task and job progress

Job Lifecycle

1

Job Creation

  • Validate domain and create domain/page records
  • Insert job with pending status
  • Optionally process sitemap or create root task
2

Job Start

  • Update status to running
  • Reset any stalled tasks from previous runs
  • Add job to worker pool for processing
3

Task Processing

  • Workers claim pending tasks atomically
  • Crawl URLs with retry logic and rate limiting
  • Store results and update task status
  • Update job progress counters
4

Job Completion

  • Automatic detection when all tasks finished
  • Calculate final statistics
  • Mark job as completed with timestamp
5

Recovery & Cleanup

  • Periodic cleanup of stuck jobs
  • Task recovery for server restarts
  • Failed task retry with exponential backoff

Codebase Structure

Architectural Principles

Adapt follows focused, testable function design:
  • Function Size - Functions kept under 50 lines where possible
  • Single Responsibility - Each function has one clear purpose
  • Testing - Strategic test coverage for critical paths and complex logic
  • Extract + Test + Commit - Proven methodology for safe refactoring

Package Organisation

cmd/
├── app/              # Main application entry point
└── test_jobs/        # Job queue testing utility

internal/
├── api/              # HTTP handlers and middleware
│   ├── handlers.go   # Route handlers
│   ├── auth.go       # JWT authentication
│   ├── jobs.go       # Job management (refactored)
│   └── response.go   # Standardised responses
├── db/               # Database operations
│   ├── db.go         # PostgreSQL connection (refactored)
│   ├── queue.go      # Queue operations
│   ├── pages.go      # Page/domain management
│   └── users.go      # User/organisation data
├── jobs/             # Job system
│   ├── manager.go    # Job lifecycle (refactored)
│   ├── worker.go     # Worker pool
│   └── types.go      # Job/task types
├── crawler/          # Web crawling
│   ├── crawler.go    # HTTP client (refactored)
│   ├── sitemap.go    # Sitemap parsing
│   └── config.go     # Crawler configuration
└── util/             # Shared utilities
    └── url.go        # URL normalisation

System Monitoring

Sentry Integration Strategy

Adapt uses Sentry for both error tracking and performance monitoring:
sentry.Init(sentry.ClientOptions{
    Dsn:              config.SentryDSN,
    Environment:      config.Env,
    TracesSampleRate: 0.1, // 10% in production, 100% in development
    AttachStacktrace: true,
    Debug:           config.Env == "development",
})
Critical Business Logic Failures:
  • Job creation, start, and cancellation failures
  • Worker startup failures and task status update failures
  • Transaction failures and stuck job cleanup failures
  • Database connection and server startup/shutdown failures
Performance Monitoring Spans:
  • manager.create_job, manager.start_job, manager.cancel_job - Job operations
  • manager.get_job, manager.get_job_status - Job queries
  • manager.process_sitemap - Sitemap processing
  • db.cleanup_stuck_jobs, db.create_page_records - Database operations

Health Monitoring

  • Database Health - Connection status and query performance
  • Worker Status - Active worker count and task processing rates
  • Job Progress - Real-time completion tracking and statistics
  • API Performance - Request timing and error rates

Frontend Integration

Template + Data Binding System

Adapt uses a template-based approach with attribute-based event handling:
<!-- Dashboard with attribute-based event handling -->
<div class="dashboard">
  <button bb-action="refresh-dashboard">↻ Refresh</button>
  <button bb-action="create-job">+ New Job</button>
  <div bb-action="view-job-details" bb-data-job-id="123">View Details</div>
</div>

<!-- Data binding for dynamic content -->
<div class="stats">
  <span data-bb-bind="stats.total_jobs">0</span>
  <div data-bb-template="job">
    <h4 data-bb-bind="domain">Loading...</h4>
    <div data-bb-bind-style="width:{progress}%"></div>
    <span data-bb-bind="status">pending</span>
  </div>
</div>

<!-- JavaScript handles bb-action attributes automatically -->
<script src="/js/bb-data-binder.min.js"></script>
Integration Benefits:
  • Users control all HTML structure and CSS styling
  • No CSS conflicts with existing designs
  • Works with any frontend framework (Webflow, custom sites)
  • Lightweight JavaScript library (~50KB)
  • Complete form handling with validation and authentication
  • Real-time data binding with template engine

Security & Authentication

JWT Authentication

  • Supabase Auth Integration - Validates JWT tokens from Supabase
  • User Context - Extracts user and organisation IDs from tokens
  • Protected Endpoints - Requires authentication for job operations
  • Row Level Security - PostgreSQL RLS policies for data isolation

Rate Limiting

  • IP-Based Limiting - Token bucket algorithm (5 requests/second default)
  • Client IP Detection - Supports X-Forwarded-For headers for proxies
  • Crawler Rate Limiting - Configurable delays between URL requests
  • Concurrency Controls - Per-job worker limits

Request Security

  • Input Validation - URL and parameter sanitisation
  • Error Sanitisation - Prevents information leakage
  • CORS Configuration - Controlled cross-origin access
  • Request Tracking - Unique request IDs for audit trails

Deployment Architecture

Infrastructure

  • Hosting - Fly.io with auto-scaling
  • Database - PostgreSQL with connection pooling (Supabase)
  • CDN - Cloudflare for caching and protection
  • Monitoring - Sentry (errors), Grafana Cloud (traces), Codecov (coverage)
  • Authentication - Supabase Auth with custom domain
  • Real-time - Supabase Realtime for live job progress updates
  • Storage - Supabase Storage (hot) + Cloudflare R2 (cold archive)

Data Storage Strategy

Hot Storage

Supabase Storage for recent and frequently accessed files:
  • Temporary assets
  • Crawler logs for active jobs
  • Recent HTML page captures for debugging
  • Fast, instant access for day-to-day operations

Cold Storage

Cloudflare R2 for long-term archival:
  • Historical data older than 30-90 days
  • Automated Go background job moves data from hot to cold
  • Significantly lower storage costs with no egress fees
  • Ideal for large volumes of infrequently accessed data

Performance Optimisation

Database Optimisations

  • Connection Pooling - 45 max open, 18 max idle connections
  • Query Optimisation - Indexed queries and efficient joins
  • Batch Operations - Reduce individual database calls
  • Lock-Free Task Claiming - FOR UPDATE SKIP LOCKED prevents contention

Crawler Optimisations

  • Concurrent Processing - Multiple workers process URLs simultaneously
  • Connection Reuse - HTTP client connection pooling
  • Rate Limiting - Prevents overwhelming target servers
  • Response Streaming - Efficient memory usage for large responses

Memory Management

  • Resource Cleanup - Proper goroutine and connection cleanup
  • Buffer Management - Controlled memory allocation
  • Garbage Collection - Optimised for low-latency operations

Supabase Integration

Real-time Features

Uses Postgres Changes subscriptions via Supabase Realtime: Implemented:
  • Notification Badge - Real-time updates when jobs complete (v0.20.0)
    • Postgres Changes subscription on notifications table
    • WebSocket CSP configured for wss://adapt.auth.goodnative.co
    • 200ms query delay to avoid transaction visibility race condition
Planned:
  • Live Job Progress - Postgres Changes on jobs table for instant updates
  • Dashboard Stats - Real-time totals without page refresh
  • Team Presence - Live indicators for multi-user organisations

Future Enhancements

  • Database Functions (Stage 5) - Move CPU-intensive queries to PostgreSQL functions
  • Edge Functions (Stage 6+) - Handle webhooks and scheduled jobs
  • File Storage (Stage 5) - Store crawler logs and screenshots
  • Enhanced RLS (Stage 6) - Replace Go auth middleware with database-level policies

Recent Refactoring Success

5 monster functions eliminated:
  • getJobTasks: 216 → 56 lines (74% reduction)
  • CreateJob: 232 → 42 lines (82% reduction)
  • setupJobURLDiscovery: 108 → 17 lines (84% reduction)
  • setupSchema: 216 → 27 lines (87% reduction)
  • WarmURL: 377 → 68 lines (82% reduction)
Results: 80% complexity reduction, 350+ tests created during refactoring

Build docs developers (and LLMs) love