Skip to main content

Overview

Crawl jobs are the core of Adapt’s website monitoring system. Each job crawls your website to check for broken links, measure performance, and optionally warm your cache for faster page loads.

Creating Your First Job

1

Navigate to the Dashboard

Log in to your Adapt account and navigate to the Jobs page from the main dashboard.
2

Enter Your Domain

Click “Create New Job” and enter your website domain (e.g., example.com).
Adapt automatically normalises domains and adds the https:// protocol.
3

Configure Job Options

Choose your crawl settings:
  • Use Sitemap: Discover pages from your sitemap.xml (recommended)
  • Find Links: Crawl links found on pages to discover additional URLs
  • Max Pages: Limit the number of pages to crawl (0 = unlimited)
  • Concurrency: Number of pages to check simultaneously (default: 20)
4

Start the Job

Click “Start Job” to begin crawling. You’ll see real-time progress updates as pages are processed.

API Usage

Create jobs programmatically using the REST API:
curl -X POST https://adapt.app.goodnative.co/v1/jobs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "options": {
      "use_sitemap": true,
      "find_links": true,
      "max_pages": 100,
      "concurrency": 20
    }
  }'
Response:
{
  "status": "success",
  "data": {
    "id": "job_123abc",
    "domain": "example.com",
    "status": "created",
    "organisation_id": "org_456def",
    "options": {
      "use_sitemap": true,
      "find_links": true,
      "max_pages": 100,
      "concurrency": 20
    },
    "created_at": "2023-05-18T12:34:56Z"
  },
  "meta": {
    "timestamp": "2023-05-18T12:34:56Z",
    "version": "1.0.0"
  }
}

Configuration Options

Sitemap Discovery

When enabled, Adapt fetches your sitemap.xml to discover pages. This is the fastest and most reliable method for finding all pages on your site.
Enable sitemap discovery for the most comprehensive crawl coverage.
When enabled, Adapt follows links found on each page to discover additional URLs. This helps find pages not listed in your sitemap.

Max Pages

Limit the number of pages to crawl:
  • 0: Unlimited (crawls all discovered pages)
  • > 0: Stops after reaching the specified limit
Max pages includes both successful and failed page checks.

Concurrency

Controls how many pages are checked simultaneously:
  • Default: 20 concurrent requests
  • Range: 1-100
  • Recommendation: Start with 20 and adjust based on your server capacity
Higher concurrency speeds up crawls but increases server load. Respect your hosting provider’s rate limits.

Crawl Delay & Rate Limiting

Adapt automatically respects your robots.txt crawl-delay directive:
User-agent: *
Crawl-delay: 2
If your robots.txt specifies a crawl-delay, Adapt will honour it automatically. You can also set adaptive delays per domain in your organisation settings.

Cache Warming

Adapt includes intelligent cache warming to improve page load times:
  1. First Request: Checks the page and records the cache status
  2. Second Request: If cache was MISS or EXPIRED, Adapt makes a second request to warm the cache
  3. Priority Order: Homepage and high-traffic pages are warmed first
Connect Google Analytics to prioritise warming your most-visited pages first.

Job Lifecycle

Jobs progress through these states:
StatusDescription
createdJob created, waiting to start
runningActively crawling pages
completedFinished successfully
failedEncountered an error
cancelledManually cancelled by user

Monitoring Job Progress

Track your job in real-time:
curl https://adapt.app.goodnative.co/v1/jobs/job_123abc \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Response:
{
  "status": "success",
  "data": {
    "id": "job_123abc",
    "domain": "example.com",
    "status": "running",
    "progress": {
      "total_tasks": 150,
      "completed_tasks": 45,
      "failed_tasks": 2,
      "skipped_tasks": 0,
      "percentage": 31.33
    },
    "stats": {
      "avg_response_time": 234,
      "cache_hit_ratio": 0.85,
      "total_bytes": 2048576
    },
    "created_at": "2023-05-18T12:34:56Z",
    "started_at": "2023-05-18T12:35:01Z"
  }
}

Cancelling Jobs

Stop a running job:
curl -X POST https://adapt.app.goodnative.co/v1/jobs/job_123abc/cancel \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Best Practices

For your first crawl, set a max_pages limit (e.g., 50) to test configuration and performance. Once validated, remove the limit for full site coverage.
Instead of manually creating jobs, set up a scheduler to run crawls automatically every 6, 12, 24, or 48 hours. See Schedulers for details.
Check your daily page quota before running large crawls:
curl https://adapt.app.goodnative.co/v1/usage \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Build docs developers (and LLMs) love