Skip to main content

Overview

Scheduled crawls enable automated, recurring site health monitoring and cache warming. Set up once and Adapt continuously monitors your site at your preferred interval.

How Scheduling Works

Scheduler Architecture

Each scheduler is a persistent configuration that automatically creates jobs:
type Scheduler struct {
    ID                    string
    DomainID              int
    OrganisationID        string
    ScheduleIntervalHours int        // 6, 12, 24, or 48 hours
    NextRunAt             time.Time  // When next job will be created
    IsEnabled             bool       // Can be paused/resumed
    
    // Job configuration
    Concurrency           int
    FindLinks             bool
    MaxPages              int
    IncludePaths          []string
    ExcludePaths          []string
}

Scheduling Intervals

Four predefined intervals optimised for different use cases:

6 Hours

High-frequency monitoring4 crawls per dayIdeal for: E-commerce sites, news sites, frequently updated content

12 Hours

Twice daily2 crawls per dayIdeal for: Business sites, marketing pages, daily content updates

24 Hours

Daily monitoring1 crawl per dayIdeal for: Corporate sites, documentation, weekly content updates

48 Hours

Every 2 days3-4 crawls per weekIdeal for: Static sites, infrequently updated content

Creating a Scheduler

Via API

curl -X POST "https://adapt.beehivebusiness.builders/v1/schedulers" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "schedule_interval_hours": 12,
    "concurrency": 20,
    "find_links": true,
    "max_pages": 1000,
    "include_paths": ["/blog/*"],
    "exclude_paths": ["/admin/*"],
    "is_enabled": true
  }'

Response Format

{
  "id": "scheduler-uuid-123",
  "domain": "example.com",
  "schedule_interval_hours": 12,
  "next_run_at": "2024-01-15T22:00:00Z",
  "is_enabled": true,
  "concurrency": 20,
  "find_links": true,
  "max_pages": 1000,
  "include_paths": ["/blog/*"],
  "exclude_paths": ["/admin/*"],
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:00:00Z"
}

Scheduler Lifecycle

Automatic Job Creation

Scheduler daemon checks for ready schedulers every minute:
func (s *SchedulerService) checkSchedulers() {
    // Query schedulers ready to run
    schedulers, err := db.GetSchedulersReadyToRun(ctx, limit)
    
    for _, scheduler := range schedulers {
        // Create job with scheduler configuration
        job, err := jobManager.CreateJob(ctx, &JobOptions{
            Domain:       scheduler.Domain,
            Concurrency:  scheduler.Concurrency,
            FindLinks:    scheduler.FindLinks,
            MaxPages:     scheduler.MaxPages,
            IncludePaths: scheduler.IncludePaths,
            ExcludePaths: scheduler.ExcludePaths,
            SchedulerID:  &scheduler.ID,  // Link job to scheduler
        })
        
        // Calculate next run time
        nextRun := time.Now().Add(
            time.Duration(scheduler.ScheduleIntervalHours) * time.Hour
        )
        
        // Update scheduler for next cycle
        db.UpdateSchedulerNextRun(ctx, scheduler.ID, nextRun)
    }
}

Next Run Calculation

Schedulers calculate next run based on current time (not last job completion):
// Initial creation
scheduler.NextRunAt = now.Add(
    time.Duration(scheduleIntervalHours) * time.Hour
)

// After job creation
scheduler.NextRunAt = time.Now().Add(
    time.Duration(scheduler.ScheduleIntervalHours) * time.Hour
)
This ensures consistent intervals regardless of job duration.

Scheduler States

1

Enabled & Ready

is_enabled = true and next_run_at <= NOW()→ Job will be created automatically
2

Enabled & Scheduled

is_enabled = true and next_run_at > NOW()→ Waiting for next run time
3

Disabled

is_enabled = false→ Scheduler paused, no jobs created

Managing Schedulers

List All Schedulers

Get schedulers for your organisation:
curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
[
  {
    "id": "scheduler-1",
    "domain": "example.com",
    "schedule_interval_hours": 12,
    "next_run_at": "2024-01-15T22:00:00Z",
    "is_enabled": true
  },
  {
    "id": "scheduler-2",
    "domain": "blog.example.com",
    "schedule_interval_hours": 24,
    "next_run_at": "2024-01-16T10:00:00Z",
    "is_enabled": false
  }
]

Update Scheduler

Modify interval or configuration:
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "schedule_interval_hours": 24,
    "concurrency": 30,
    "is_enabled": true
  }'
Updating schedule_interval_hours recalculates next_run_at from current time.

Pause/Resume Scheduler

# Pause
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"is_enabled": false}'

# Resume
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"is_enabled": true}'

Delete Scheduler

curl -X DELETE "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Deleting a scheduler does not cancel existing jobs created by it.

Viewing Scheduler Jobs

Get Jobs for a Scheduler

curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}/jobs" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
[
  {
    "id": "job-123",
    "domain": "example.com",
    "status": "completed",
    "scheduler_id": "scheduler-1",
    "created_at": "2024-01-15T10:00:00Z",
    "completed_at": "2024-01-15T10:05:32Z",
    "total_tasks": 250,
    "failed_tasks": 2
  },
  {
    "id": "job-124",
    "domain": "example.com",
    "status": "running",
    "scheduler_id": "scheduler-1",
    "created_at": "2024-01-15T22:00:00Z",
    "total_tasks": 250,
    "completed_tasks": 143
  }
]

Job History Analysis

Track scheduler performance over time:
SELECT 
    scheduler_id,
    COUNT(*) as total_runs,
    AVG(duration_seconds) as avg_duration,
    AVG(completed_tasks) as avg_pages_crawled,
    AVG(failed_tasks) as avg_failures
FROM jobs
WHERE scheduler_id = $1
    AND status = 'completed'
GROUP BY scheduler_id;

Webflow Integration

Schedulers integrate with Webflow site settings:

Site-Specific Scheduling

-- Webflow site settings link to schedulers
CREATE TABLE webflow_site_settings (
    organisation_id TEXT,
    webflow_site_id TEXT,
    schedule_interval_hours INTEGER,  -- 6, 12, 24, or 48
    scheduler_id TEXT,                -- Links to schedulers table
    auto_publish_enabled BOOLEAN,
    webhook_id TEXT,
    PRIMARY KEY (organisation_id, webflow_site_id)
);

Automatic Scheduler Creation

When enabling scheduling for a Webflow site:
  1. Create Scheduler: New scheduler for site domain
  2. Link to Site: Store scheduler_id in site settings
  3. Configure Interval: Set user’s preferred frequency
  4. Enable Webhooks: Optional publish webhook for immediate crawls

Combined Triggers

Webflow sites can have both:
  • Scheduled Crawls: Regular monitoring (e.g. every 12 hours)
  • Publish Webhooks: Immediate crawl on content update
This ensures:
  • Fresh cache after publishing
  • Regular monitoring between publishes
  • Automatic link checking

Configuration Best Practices

Choosing Interval

Recommended: 6-12 hours
  • Frequent content updates
  • Many concurrent users
  • Cache expires quickly
  • Need tight monitoring

Concurrency Settings

Balance speed vs server load:
{
  "concurrency": 10,   // Conservative (large sites, shared hosting)
  "concurrency": 20,   // Balanced (most sites, dedicated hosting)
  "concurrency": 50    // Aggressive (small sites, robust infrastructure)
}

Path Filtering

Optimise crawl scope:
{
  "include_paths": [
    "/blog/*",      // Include blog section
    "/products/*"   // Include product pages
  ],
  "exclude_paths": [
    "/admin/*",     // Skip admin pages
    "/search*",     // Skip search results
    "/checkout/*"   // Skip checkout flow
  ]
}

Monitoring Scheduler Health

Key Metrics

Track scheduler reliability:
  • Success Rate: % of jobs completed successfully
  • Average Duration: Time to complete crawls
  • Failure Patterns: Recurring issues to investigate
  • Schedule Drift: How closely jobs match schedule

Alerts & Notifications

Scheduled jobs trigger Supabase notifications:
  • Job Completion: Notify when scheduled job finishes
  • Failures: Alert on failed scheduled crawls
  • Broken Links: Report newly discovered issues
  • Performance: Warn if crawl duration increases

Use Cases

Detect broken links, slow pages, and errors before users do.
Keep CDN cache fresh with regular warming before expiration.
Track site performance trends over days, weeks, and months.
Identify when pages change or new issues appear.
Prove site availability and performance for service agreements.

Scheduler Limitations

One Scheduler Per Domain: Each domain can only have one scheduler per organisation.To change intervals, update the existing scheduler rather than creating a new one.
Concurrent Jobs: Schedulers will not create a new job if the previous job is still running.Consider increasing concurrency or interval if jobs take longer than the schedule interval.

Build docs developers (and LLMs) love