Overview
Scheduled crawls enable automated, recurring site health monitoring and cache warming. Set up once and Adapt continuously monitors your site at your preferred interval.
How Scheduling Works
Scheduler Architecture
Each scheduler is a persistent configuration that automatically creates jobs:
type Scheduler struct {
ID string
DomainID int
OrganisationID string
ScheduleIntervalHours int // 6, 12, 24, or 48 hours
NextRunAt time . Time // When next job will be created
IsEnabled bool // Can be paused/resumed
// Job configuration
Concurrency int
FindLinks bool
MaxPages int
IncludePaths [] string
ExcludePaths [] string
}
Scheduling Intervals
Four predefined intervals optimised for different use cases:
6 Hours High-frequency monitoring 4 crawls per day Ideal for: E-commerce sites, news sites, frequently updated content
12 Hours Twice daily 2 crawls per day Ideal for: Business sites, marketing pages, daily content updates
24 Hours Daily monitoring 1 crawl per day Ideal for: Corporate sites, documentation, weekly content updates
48 Hours Every 2 days 3-4 crawls per week Ideal for: Static sites, infrequently updated content
Creating a Scheduler
Via API
curl -X POST "https://adapt.beehivebusiness.builders/v1/schedulers" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"schedule_interval_hours": 12,
"concurrency": 20,
"find_links": true,
"max_pages": 1000,
"include_paths": ["/blog/*"],
"exclude_paths": ["/admin/*"],
"is_enabled": true
}'
{
"id" : "scheduler-uuid-123" ,
"domain" : "example.com" ,
"schedule_interval_hours" : 12 ,
"next_run_at" : "2024-01-15T22:00:00Z" ,
"is_enabled" : true ,
"concurrency" : 20 ,
"find_links" : true ,
"max_pages" : 1000 ,
"include_paths" : [ "/blog/*" ],
"exclude_paths" : [ "/admin/*" ],
"created_at" : "2024-01-15T10:00:00Z" ,
"updated_at" : "2024-01-15T10:00:00Z"
}
Scheduler Lifecycle
Automatic Job Creation
Scheduler daemon checks for ready schedulers every minute:
func ( s * SchedulerService ) checkSchedulers () {
// Query schedulers ready to run
schedulers , err := db . GetSchedulersReadyToRun ( ctx , limit )
for _ , scheduler := range schedulers {
// Create job with scheduler configuration
job , err := jobManager . CreateJob ( ctx , & JobOptions {
Domain : scheduler . Domain ,
Concurrency : scheduler . Concurrency ,
FindLinks : scheduler . FindLinks ,
MaxPages : scheduler . MaxPages ,
IncludePaths : scheduler . IncludePaths ,
ExcludePaths : scheduler . ExcludePaths ,
SchedulerID : & scheduler . ID , // Link job to scheduler
})
// Calculate next run time
nextRun := time . Now (). Add (
time . Duration ( scheduler . ScheduleIntervalHours ) * time . Hour
)
// Update scheduler for next cycle
db . UpdateSchedulerNextRun ( ctx , scheduler . ID , nextRun )
}
}
Next Run Calculation
Schedulers calculate next run based on current time (not last job completion):
// Initial creation
scheduler . NextRunAt = now . Add (
time . Duration ( scheduleIntervalHours ) * time . Hour
)
// After job creation
scheduler . NextRunAt = time . Now (). Add (
time . Duration ( scheduler . ScheduleIntervalHours ) * time . Hour
)
This ensures consistent intervals regardless of job duration.
Scheduler States
Enabled & Ready
is_enabled = true and next_run_at <= NOW()→ Job will be created automatically
Enabled & Scheduled
is_enabled = true and next_run_at > NOW()→ Waiting for next run time
Disabled
is_enabled = false→ Scheduler paused, no jobs created
Managing Schedulers
List All Schedulers
Get schedulers for your organisation:
curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
[
{
"id" : "scheduler-1" ,
"domain" : "example.com" ,
"schedule_interval_hours" : 12 ,
"next_run_at" : "2024-01-15T22:00:00Z" ,
"is_enabled" : true
},
{
"id" : "scheduler-2" ,
"domain" : "blog.example.com" ,
"schedule_interval_hours" : 24 ,
"next_run_at" : "2024-01-16T10:00:00Z" ,
"is_enabled" : false
}
]
Update Scheduler
Modify interval or configuration:
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"schedule_interval_hours": 24,
"concurrency": 30,
"is_enabled": true
}'
Updating schedule_interval_hours recalculates next_run_at from current time.
Pause/Resume Scheduler
# Pause
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"is_enabled": false}'
# Resume
curl -X PUT "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"is_enabled": true}'
Delete Scheduler
curl -X DELETE "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Deleting a scheduler does not cancel existing jobs created by it.
Viewing Scheduler Jobs
Get Jobs for a Scheduler
curl -X GET "https://adapt.beehivebusiness.builders/v1/schedulers/{schedulerID}/jobs" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
[
{
"id" : "job-123" ,
"domain" : "example.com" ,
"status" : "completed" ,
"scheduler_id" : "scheduler-1" ,
"created_at" : "2024-01-15T10:00:00Z" ,
"completed_at" : "2024-01-15T10:05:32Z" ,
"total_tasks" : 250 ,
"failed_tasks" : 2
},
{
"id" : "job-124" ,
"domain" : "example.com" ,
"status" : "running" ,
"scheduler_id" : "scheduler-1" ,
"created_at" : "2024-01-15T22:00:00Z" ,
"total_tasks" : 250 ,
"completed_tasks" : 143
}
]
Job History Analysis
Track scheduler performance over time:
SELECT
scheduler_id,
COUNT ( * ) as total_runs,
AVG (duration_seconds) as avg_duration,
AVG (completed_tasks) as avg_pages_crawled,
AVG (failed_tasks) as avg_failures
FROM jobs
WHERE scheduler_id = $ 1
AND status = 'completed'
GROUP BY scheduler_id;
Webflow Integration
Schedulers integrate with Webflow site settings:
Site-Specific Scheduling
-- Webflow site settings link to schedulers
CREATE TABLE webflow_site_settings (
organisation_id TEXT ,
webflow_site_id TEXT ,
schedule_interval_hours INTEGER , -- 6, 12, 24, or 48
scheduler_id TEXT , -- Links to schedulers table
auto_publish_enabled BOOLEAN ,
webhook_id TEXT ,
PRIMARY KEY (organisation_id, webflow_site_id)
);
Automatic Scheduler Creation
When enabling scheduling for a Webflow site:
Create Scheduler : New scheduler for site domain
Link to Site : Store scheduler_id in site settings
Configure Interval : Set user’s preferred frequency
Enable Webhooks : Optional publish webhook for immediate crawls
Combined Triggers
Webflow sites can have both:
Scheduled Crawls : Regular monitoring (e.g. every 12 hours)
Publish Webhooks : Immediate crawl on content update
This ensures:
Fresh cache after publishing
Regular monitoring between publishes
Automatic link checking
Configuration Best Practices
Choosing Interval
High-Traffic Sites
Business Sites
Static Sites
Recommended : 6-12 hours
Frequent content updates
Many concurrent users
Cache expires quickly
Need tight monitoring
Recommended : 12-24 hours
Daily/weekly content updates
Moderate traffic
Stable cache configuration
Balance cost and coverage
Recommended : 24-48 hours
Infrequent updates
Low traffic
Stable infrastructure
Minimise resource usage
Concurrency Settings
Balance speed vs server load:
{
"concurrency" : 10 , // Conservative (large sites, shared hosting)
"concurrency" : 20 , // Balanced (most sites, dedicated hosting)
"concurrency" : 50 // Aggressive (small sites, robust infrastructure)
}
Path Filtering
Optimise crawl scope:
{
"include_paths" : [
"/blog/*" , // Include blog section
"/products/*" // Include product pages
],
"exclude_paths" : [
"/admin/*" , // Skip admin pages
"/search*" , // Skip search results
"/checkout/*" // Skip checkout flow
]
}
Monitoring Scheduler Health
Key Metrics
Track scheduler reliability:
Success Rate : % of jobs completed successfully
Average Duration : Time to complete crawls
Failure Patterns : Recurring issues to investigate
Schedule Drift : How closely jobs match schedule
Alerts & Notifications
Scheduled jobs trigger Supabase notifications:
Job Completion : Notify when scheduled job finishes
Failures : Alert on failed scheduled crawls
Broken Links : Report newly discovered issues
Performance : Warn if crawl duration increases
Use Cases
Detect broken links, slow pages, and errors before users do.
Keep CDN cache fresh with regular warming before expiration.
Track site performance trends over days, weeks, and months.
Identify when pages change or new issues appear.
Prove site availability and performance for service agreements.
Scheduler Limitations
One Scheduler Per Domain : Each domain can only have one scheduler per organisation.To change intervals, update the existing scheduler rather than creating a new one.
Concurrent Jobs : Schedulers will not create a new job if the previous job is still running.Consider increasing concurrency or interval if jobs take longer than the schedule interval.