Skip to main content

Overview

Tasks represent individual page crawls within a job. Each task tracks the crawl status, response time, cache performance, and any errors for a specific URL.

List Tasks for Job

GET /v1/jobs/{job_id}/tasks?page=1&limit=50&status=failed&status_code=404
Authorization: Bearer <token>

Path Parameters

job_id
string
required
Unique job identifier

Query Parameters

limit
integer
default:50
Results per page (max 200)
offset
integer
default:0
Number of results to skip
status
string
Filter by task status: pending, running, completed, failed
cache
string
Filter by cache status: hit, miss
  • miss: Pages with MISS or EXPIRED cache status (could benefit from cache warming)
  • hit: Pages with HIT or DYNAMIC cache status (cache performing optimally)
path
string
Filter by path keyword (case-insensitive partial match)
sort
string
Sort field and direction. Add - prefix for descending order.Available fields:
  • path: URL path
  • status: Task status
  • response_time: Response time in milliseconds
  • cache_status: Cache status
  • second_response_time: Second request response time
  • status_code: HTTP status code
  • page_views_7d: Page views (last 7 days)
  • page_views_28d: Page views (last 28 days)
  • page_views_180d: Page views (last 180 days)
  • created_at: Creation timestamp (default)
Examples: sort=response_time, sort=-page_views_7d

Response Fields

tasks
array
Array of task objects
tasks[].id
string
Unique task identifier
tasks[].job_id
string
Parent job identifier
tasks[].url
string
Full URL that was crawled
tasks[].path
string
URL path component
tasks[].host
string
Hostname (only included if different from job domain)
tasks[].status
string
Task status: pending, running, completed, failed
tasks[].status_code
integer
HTTP status code (e.g., 200, 404, 500)
tasks[].response_time
integer
Response time in milliseconds
tasks[].cache_status
string
Cache status: HIT, MISS, EXPIRED, DYNAMIC
tasks[].second_response_time
integer
Response time of second request (for cache validation)
tasks[].second_cache_status
string
Cache status of second request
tasks[].content_type
string
Content-Type header value
tasks[].error
string
Error message if task failed
tasks[].source_type
string
How the URL was discovered: sitemap, link_crawl, manual
tasks[].source_url
string
URL where this page was discovered (for link crawl)
tasks[].retry_count
integer
Number of retry attempts
tasks[].page_views_7d
integer
Page views in last 7 days (requires Google Analytics integration)
tasks[].page_views_28d
integer
Page views in last 28 days (requires Google Analytics integration)
tasks[].page_views_180d
integer
Page views in last 180 days (requires Google Analytics integration)
tasks[].created_at
string
ISO 8601 timestamp when task was created
tasks[].started_at
string
ISO 8601 timestamp when task started (null if not started)
tasks[].completed_at
string
ISO 8601 timestamp when task completed (null if not completed)
pagination
object
Pagination metadata
pagination.limit
integer
Results per page
pagination.offset
integer
Current offset
pagination.total
integer
Total number of tasks
pagination.has_next
boolean
Whether there are more results
pagination.has_prev
boolean
Whether there are previous results
summary
object
Summary statistics for all tasks in the job
summary.total_tasks
integer
Total number of tasks
summary.by_status
object
Task counts grouped by status
summary.by_status_code
object
Task counts grouped by HTTP status code
summary.performance
object
Performance statistics

Export Task Results

GET /v1/jobs/{job_id}/export?type=broken-links
Authorization: Bearer <token>

Path Parameters

job_id
string
required
Unique job identifier

Query Parameters

type
string
default:"job"
Export type:
  • job: All tasks (default)
  • broken-links: Only failed tasks (404s, 500s, etc.)
  • slow-pages: Only pages with response time > 3000ms

Response Fields

export_type
string
Type of export performed
export_time
string
ISO 8601 timestamp when export was generated
total_tasks
integer
Number of tasks included in export (max 10,000)
columns
array
Column definitions for the exported data
columns[].key
string
Field key in task objects
columns[].label
string
Human-readable column label
tasks
array
Array of task objects (structure varies by export type)

Pagination Strategy

  • Default: 50 results per page (balance of data vs performance)
  • Maximum: 200 results per page (prevents overwhelming responses)
  • Large datasets: Use filtering to reduce total results before pagination
  • Export option: For bulk data access (10,000+ tasks), use the export endpoint

Use Cases

GET /v1/jobs/{job_id}/tasks?status=failed&sort=-page_views_7d
Find failed tasks (404s, 500s) sorted by traffic impact.

Identify Cache Opportunities

GET /v1/jobs/{job_id}/tasks?cache=miss&sort=-response_time
Find pages with cache misses that could benefit from cache warming.

Monitor High-Traffic Pages

GET /v1/jobs/{job_id}/tasks?sort=-page_views_28d&limit=100
View most visited pages to prioritise performance optimisation.

Find Slow Pages

GET /v1/jobs/{job_id}/export?type=slow-pages
Export all pages with response times over 3 seconds.

Build docs developers (and LLMs) love