Scheduled Crawls

Overview

Scheduled crawls enable automated, recurring site health monitoring and cache warming. Set up once and Adapt continuously monitors your site at your preferred interval.

How Scheduling Works

Scheduler Architecture

Each scheduler is a persistent configuration that automatically creates jobs:

type Scheduler struct {
    ID                    string
    DomainID              int
    OrganisationID        string
    ScheduleIntervalHours int        // 6, 12, 24, or 48 hours
    NextRunAt             time.Time  // When next job will be created
    IsEnabled             bool       // Can be paused/resumed
    
    // Job configuration
    Concurrency           int
    FindLinks             bool
    MaxPages              int
    IncludePaths          []string
    ExcludePaths          []string
}

Scheduling Intervals

Four predefined intervals optimised for different use cases:

6 Hours

High-frequency monitoring4 crawls per dayIdeal for: E-commerce sites, news sites, frequently updated content

12 Hours

Twice daily2 crawls per dayIdeal for: Business sites, marketing pages, daily content updates

24 Hours

Daily monitoring1 crawl per dayIdeal for: Corporate sites, documentation, weekly content updates

48 Hours

Every 2 days3-4 crawls per weekIdeal for: Static sites, infrequently updated content

Creating a Scheduler

Via API

curl -X POST "https://adapt.beehivebusiness.builders/v1/schedulers" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "schedule_interval_hours": 12,
    "concurrency": 20,
    "find_links": true,
    "max_pages": 1000,
    "include_paths": ["/blog/*"],
    "exclude_paths": ["/admin/*"],
    "is_enabled": true
  }'

Response Format

{
  "id": "scheduler-uuid-123",
  "domain": "example.com",
  "schedule_interval_hours": 12,
  "next_run_at": "2024-01-15T22:00:00Z",
  "is_enabled": true,
  "concurrency": 20,
  "find_links": true,
  "max_pages": 1000,
  "include_paths": ["/blog/*"],
  "exclude_paths": ["/admin/*"],
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:00:00Z"
}

Scheduler Lifecycle

Automatic Job Creation

Scheduler daemon checks for ready schedulers every minute:

func (s *SchedulerService) checkSchedulers() {
    // Query schedulers ready to run
    schedulers, err := db.GetSchedulersReadyToRun(ctx, limit)
    
    for _, scheduler := range schedulers {
        // Create job with scheduler configuration
        job, err := jobManager.CreateJob(ctx, &JobOptions{
            Domain:       scheduler.Domain,
            Concurrency:  scheduler.Concurrency,
            FindLinks:    scheduler.FindLinks,
            MaxPages:     scheduler.MaxPages,
            IncludePaths: scheduler.IncludePaths,
            ExcludePaths: scheduler.ExcludePaths,
            SchedulerID:  &scheduler.ID,  // Link job to scheduler
        })
        
        // Calculate next run time
        nextRun := time.Now().Add(
            time.Duration(scheduler.ScheduleIntervalHours) * time.Hour
        )
        
        // Update scheduler for next cycle
        db.UpdateSchedulerNextRun(ctx, scheduler.ID, nextRun)
    }
}

Next Run Calculation

Schedulers calculate next run based on current time (not last job completion):

// Initial creation
scheduler.NextRunAt = now.Add(
    time.Duration(scheduleIntervalHours) * time.Hour
)

// After job creation
scheduler.NextRunAt = time.Now().Add(
    time.Duration(scheduler.ScheduleIntervalHours) * time.Hour
)

This ensures consistent intervals regardless of job duration.

Scheduler States

Enabled & Ready

is_enabled = true and next_run_at <= NOW()→ Job will be created automatically

Enabled & Scheduled

is_enabled = true and next_run_at > NOW()→ Waiting for next run time

Disabled

is_enabled = false→ Scheduler paused, no jobs created

Managing Schedulers

List All Schedulers

Get schedulers for your organisation:

curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:

[
  {
    "id": "scheduler-1",
    "domain": "example.com",
    "schedule_interval_hours": 12,
    "next_run_at": "2024-01-15T22:00:00Z",
    "is_enabled": true
  },
  {
    "id": "scheduler-2",
    "domain": "blog.example.com",
    "schedule_interval_hours": 24,
    "next_run_at": "2024-01-16T10:00:00Z",
    "is_enabled": false
  }
]

Update Scheduler

Modify interval or configuration:

curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "schedule_interval_hours": 24,
    "concurrency": 30,
    "is_enabled": true
  }'

Updating schedule_interval_hours recalculates next_run_at from current time.

Pause/Resume Scheduler

# Pause
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"is_enabled": false}'

# Resume
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"is_enabled": true}'

Delete Scheduler

curl -X DELETE "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Deleting a scheduler does not cancel existing jobs created by it.

Viewing Scheduler Jobs

Get Jobs for a Scheduler

curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}/jobs" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:

[
  {
    "id": "job-123",
    "domain": "example.com",
    "status": "completed",
    "scheduler_id": "scheduler-1",
    "created_at": "2024-01-15T10:00:00Z",
    "completed_at": "2024-01-15T10:05:32Z",
    "total_tasks": 250,
    "failed_tasks": 2
  },
  {
    "id": "job-124",
    "domain": "example.com",
    "status": "running",
    "scheduler_id": "scheduler-1",
    "created_at": "2024-01-15T22:00:00Z",
    "total_tasks": 250,
    "completed_tasks": 143
  }
]

Job History Analysis

Track scheduler performance over time:

SELECT 
    scheduler_id,
    COUNT(*) as total_runs,
    AVG(duration_seconds) as avg_duration,
    AVG(completed_tasks) as avg_pages_crawled,
    AVG(failed_tasks) as avg_failures
FROM jobs
WHERE scheduler_id = $1
    AND status = 'completed'
GROUP BY scheduler_id;

Webflow Integration

Schedulers integrate with Webflow site settings:

Site-Specific Scheduling

-- Webflow site settings link to schedulers
CREATE TABLE webflow_site_settings (
    organisation_id TEXT,
    webflow_site_id TEXT,
    schedule_interval_hours INTEGER,  -- 6, 12, 24, or 48
    scheduler_id TEXT,                -- Links to schedulers table
    auto_publish_enabled BOOLEAN,
    webhook_id TEXT,
    PRIMARY KEY (organisation_id, webflow_site_id)
);

Automatic Scheduler Creation

When enabling scheduling for a Webflow site:

Create Scheduler: New scheduler for site domain
Link to Site: Store scheduler_id in site settings
Configure Interval: Set user’s preferred frequency
Enable Webhooks: Optional publish webhook for immediate crawls

Combined Triggers

Webflow sites can have both:

Scheduled Crawls: Regular monitoring (e.g. every 12 hours)
Publish Webhooks: Immediate crawl on content update

This ensures:

Fresh cache after publishing
Regular monitoring between publishes
Automatic link checking

Configuration Best Practices

Choosing Interval

High-Traffic Sites
Business Sites
Static Sites

Recommended: 6-12 hours

Frequent content updates
Many concurrent users
Cache expires quickly
Need tight monitoring

Concurrency Settings

Balance speed vs server load:

{
  "concurrency": 10,   // Conservative (large sites, shared hosting)
  "concurrency": 20,   // Balanced (most sites, dedicated hosting)
  "concurrency": 50    // Aggressive (small sites, robust infrastructure)
}

Path Filtering

Optimise crawl scope:

{
  "include_paths": [
    "/blog/*",      // Include blog section
    "/products/*"   // Include product pages
  ],
  "exclude_paths": [
    "/admin/*",     // Skip admin pages
    "/search*",     // Skip search results
    "/checkout/*"   // Skip checkout flow
  ]
}

Monitoring Scheduler Health

Key Metrics

Track scheduler reliability:

Success Rate: % of jobs completed successfully
Average Duration: Time to complete crawls
Failure Patterns: Recurring issues to investigate
Schedule Drift: How closely jobs match schedule

Alerts & Notifications

Scheduled jobs trigger Supabase notifications:

Job Completion: Notify when scheduled job finishes
Failures: Alert on failed scheduled crawls
Broken Links: Report newly discovered issues
Performance: Warn if crawl duration increases

Use Cases

Continuous Monitoring

Detect broken links, slow pages, and errors before users do.

Cache Refresh

Keep CDN cache fresh with regular warming before expiration.

Performance Baseline

Track site performance trends over days, weeks, and months.

Change Detection

Identify when pages change or new issues appear.

SLA Compliance

Prove site availability and performance for service agreements.

Scheduler Limitations

One Scheduler Per Domain: Each domain can only have one scheduler per organisation.To change intervals, update the existing scheduler rather than creating a new one.

Concurrent Jobs: Schedulers will not create a new job if the previous job is still running.Consider increasing concurrency or interval if jobs take longer than the schedule interval.

Cache Warming - What happens during each scheduled crawl
Broken Link Detection - Automated link monitoring
Performance Monitoring - Track trends over time

Get Started

Core Features

Integrations

Guides

​Overview

​How Scheduling Works

​Scheduler Architecture

​Scheduling Intervals

6 Hours

12 Hours

24 Hours

48 Hours

​Creating a Scheduler

​Via API

​Response Format

​Scheduler Lifecycle

​Automatic Job Creation

​Next Run Calculation

​Scheduler States

​Managing Schedulers

​List All Schedulers

​Update Scheduler

​Pause/Resume Scheduler

​Delete Scheduler

​Viewing Scheduler Jobs

​Get Jobs for a Scheduler

​Job History Analysis

​Webflow Integration

​Site-Specific Scheduling

​Automatic Scheduler Creation

​Combined Triggers

​Configuration Best Practices

​Choosing Interval

​Concurrency Settings

​Path Filtering

​Monitoring Scheduler Health

​Key Metrics

​Alerts & Notifications

​Use Cases

​Scheduler Limitations

​Related Features

Build docs developers (and LLMs) love

Overview

How Scheduling Works

Scheduler Architecture

Scheduling Intervals

Creating a Scheduler

Via API

Response Format

Scheduler Lifecycle

Automatic Job Creation

Next Run Calculation

Scheduler States

Managing Schedulers

List All Schedulers

Update Scheduler

Pause/Resume Scheduler

Delete Scheduler

Viewing Scheduler Jobs

Get Jobs for a Scheduler

Job History Analysis

Webflow Integration

Site-Specific Scheduling

Automatic Scheduler Creation

Combined Triggers

Configuration Best Practices

Choosing Interval

Concurrency Settings

Path Filtering

Monitoring Scheduler Health

Key Metrics

Alerts & Notifications

Use Cases

Scheduler Limitations

Related Features