Creating Crawl Jobs

Overview

Crawl jobs are the core of Adapt’s website monitoring system. Each job crawls your website to check for broken links, measure performance, and optionally warm your cache for faster page loads.

Creating Your First Job

Navigate to the Dashboard

Enter Your Domain

Click “Create New Job” and enter your website domain (e.g., example.com).

Adapt automatically normalises domains and adds the https:// protocol.

Configure Job Options

Choose your crawl settings:

Use Sitemap: Discover pages from your sitemap.xml (recommended)
Find Links: Crawl links found on pages to discover additional URLs
Max Pages: Limit the number of pages to crawl (0 = unlimited)
Concurrency: Number of pages to check simultaneously (default: 20)

Start the Job

Click “Start Job” to begin crawling. You’ll see real-time progress updates as pages are processed.

API Usage

Create jobs programmatically using the REST API:

curl -X POST https://adapt.app.goodnative.co/v1/jobs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "options": {
      "use_sitemap": true,
      "find_links": true,
      "max_pages": 100,
      "concurrency": 20
    }
  }'

Response:

{
  "status": "success",
  "data": {
    "id": "job_123abc",
    "domain": "example.com",
    "status": "created",
    "organisation_id": "org_456def",
    "options": {
      "use_sitemap": true,
      "find_links": true,
      "max_pages": 100,
      "concurrency": 20
    },
    "created_at": "2023-05-18T12:34:56Z"
  },
  "meta": {
    "timestamp": "2023-05-18T12:34:56Z",
    "version": "1.0.0"
  }
}

Configuration Options

Sitemap Discovery

When enabled, Adapt fetches your sitemap.xml to discover pages. This is the fastest and most reliable method for finding all pages on your site.

Enable sitemap discovery for the most comprehensive crawl coverage.

Link Crawling

When enabled, Adapt follows links found on each page to discover additional URLs. This helps find pages not listed in your sitemap.

Cross-Subdomain Link Handling

By default, Adapt follows links to different subdomains of your main domain. For example, if crawling example.com, it will also crawl blog.example.com and shop.example.com.Set allow_cross_subdomain_links to false to restrict crawling to the exact subdomain.

Max Pages

Limit the number of pages to crawl:

0: Unlimited (crawls all discovered pages)
> 0: Stops after reaching the specified limit

Max pages includes both successful and failed page checks.

Concurrency

Controls how many pages are checked simultaneously:

Default: 20 concurrent requests
Range: 1-100
Recommendation: Start with 20 and adjust based on your server capacity

Higher concurrency speeds up crawls but increases server load. Respect your hosting provider’s rate limits.

Crawl Delay & Rate Limiting

Adapt automatically respects your robots.txt crawl-delay directive:

User-agent: *
Crawl-delay: 2

If your robots.txt specifies a crawl-delay, Adapt will honour it automatically. You can also set adaptive delays per domain in your organisation settings.

Cache Warming

Adapt includes intelligent cache warming to improve page load times:

First Request: Checks the page and records the cache status
Second Request: If cache was MISS or EXPIRED, Adapt makes a second request to warm the cache
Priority Order: Homepage and high-traffic pages are warmed first

Connect Google Analytics to prioritise warming your most-visited pages first.

Job Lifecycle

Jobs progress through these states:

Status	Description
`created`	Job created, waiting to start
`running`	Actively crawling pages
`completed`	Finished successfully
`failed`	Encountered an error
`cancelled`	Manually cancelled by user

Monitoring Job Progress

Track your job in real-time:

curl https://adapt.app.goodnative.co/v1/jobs/job_123abc \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:

{
  "status": "success",
  "data": {
    "id": "job_123abc",
    "domain": "example.com",
    "status": "running",
    "progress": {
      "total_tasks": 150,
      "completed_tasks": 45,
      "failed_tasks": 2,
      "skipped_tasks": 0,
      "percentage": 31.33
    },
    "stats": {
      "avg_response_time": 234,
      "cache_hit_ratio": 0.85,
      "total_bytes": 2048576
    },
    "created_at": "2023-05-18T12:34:56Z",
    "started_at": "2023-05-18T12:35:01Z"
  }
}

Cancelling Jobs

Stop a running job:

curl -X POST https://adapt.app.goodnative.co/v1/jobs/job_123abc/cancel \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Best Practices

Start Small, Scale Up

For your first crawl, set a max_pages limit (e.g., 50) to test configuration and performance. Once validated, remove the limit for full site coverage.

Schedule Regular Crawls

Instead of manually creating jobs, set up a scheduler to run crawls automatically every 6, 12, 24, or 48 hours. See Schedulers for details.

Monitor Usage Limits

Check your daily page quota before running large crawls:

curl https://adapt.app.goodnative.co/v1/usage \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Get Started

Core Features

Integrations

Guides

Creating Crawl Jobs

Overview

Creating Your First Job

API Usage

Configuration Options

Sitemap Discovery

Link Crawling

Max Pages

Concurrency

Crawl Delay & Rate Limiting

Cache Warming

Job Lifecycle

Monitoring Job Progress

Cancelling Jobs

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Features

Integrations

Guides

​Overview

​Creating Your First Job

​API Usage

​Configuration Options

​Sitemap Discovery

​Link Crawling

​Max Pages

​Concurrency

​Crawl Delay & Rate Limiting

​Cache Warming

​Job Lifecycle

​Monitoring Job Progress

​Cancelling Jobs

​Best Practices

​Related Resources

Build docs developers (and LLMs) love

Overview

Creating Your First Job

API Usage

Configuration Options

Sitemap Discovery

Link Crawling

Max Pages

Concurrency

Crawl Delay & Rate Limiting

Cache Warming

Job Lifecycle

Monitoring Job Progress

Cancelling Jobs

Best Practices

Related Resources