Overview
Crawl jobs are the core of Adapt’s website monitoring system. Each job crawls your website to check for broken links, measure performance, and optionally warm your cache for faster page loads.Creating Your First Job
Navigate to the Dashboard
Log in to your Adapt account and navigate to the Jobs page from the main dashboard.
Enter Your Domain
Click “Create New Job” and enter your website domain (e.g.,
example.com).Adapt automatically normalises domains and adds the
https:// protocol.Configure Job Options
Choose your crawl settings:
- Use Sitemap: Discover pages from your sitemap.xml (recommended)
- Find Links: Crawl links found on pages to discover additional URLs
- Max Pages: Limit the number of pages to crawl (0 = unlimited)
- Concurrency: Number of pages to check simultaneously (default: 20)
API Usage
Create jobs programmatically using the REST API:Configuration Options
Sitemap Discovery
When enabled, Adapt fetches yoursitemap.xml to discover pages. This is the fastest and most reliable method for finding all pages on your site.
Link Crawling
When enabled, Adapt follows links found on each page to discover additional URLs. This helps find pages not listed in your sitemap.Cross-Subdomain Link Handling
Cross-Subdomain Link Handling
By default, Adapt follows links to different subdomains of your main domain. For example, if crawling
example.com, it will also crawl blog.example.com and shop.example.com.Set allow_cross_subdomain_links to false to restrict crawling to the exact subdomain.Max Pages
Limit the number of pages to crawl:- 0: Unlimited (crawls all discovered pages)
- > 0: Stops after reaching the specified limit
Max pages includes both successful and failed page checks.
Concurrency
Controls how many pages are checked simultaneously:- Default: 20 concurrent requests
- Range: 1-100
- Recommendation: Start with 20 and adjust based on your server capacity
Crawl Delay & Rate Limiting
Adapt automatically respects yourrobots.txt crawl-delay directive:
Cache Warming
Adapt includes intelligent cache warming to improve page load times:- First Request: Checks the page and records the cache status
- Second Request: If cache was MISS or EXPIRED, Adapt makes a second request to warm the cache
- Priority Order: Homepage and high-traffic pages are warmed first
Job Lifecycle
Jobs progress through these states:| Status | Description |
|---|---|
created | Job created, waiting to start |
running | Actively crawling pages |
completed | Finished successfully |
failed | Encountered an error |
cancelled | Manually cancelled by user |
Monitoring Job Progress
Track your job in real-time:Cancelling Jobs
Stop a running job:Best Practices
Start Small, Scale Up
Start Small, Scale Up
For your first crawl, set a
max_pages limit (e.g., 50) to test configuration and performance. Once validated, remove the limit for full site coverage.Schedule Regular Crawls
Schedule Regular Crawls
Instead of manually creating jobs, set up a scheduler to run crawls automatically every 6, 12, 24, or 48 hours. See Schedulers for details.
Monitor Usage Limits
Monitor Usage Limits
Check your daily page quota before running large crawls: