Skip to main content

Overview

Adapt continuously monitors your website for broken links and connectivity issues, helping you maintain a healthy site structure. The system tracks HTTP status codes, response times, and connection errors for every crawled URL.

How It Works

Automatic Status Code Tracking

Every crawled page is checked for:
  • 404 Errors: Missing pages that need to be fixed or redirected
  • 5xx Server Errors: Backend failures that affect user experience
  • Timeouts: Pages that take too long to respond (default 30s timeout)
  • Connection Failures: DNS errors, SSL issues, and network problems
  • Redirect Chains: Multiple redirects that slow down page loads

Status Code Classification

Adapt categorises responses into:
// Non-2xx status codes are flagged as errors
if statusCode < 200 || statusCode >= 300 {
    result.Error = fmt.Sprintf("non-success status code: %d", statusCode)
}
Successful: 2xx responses (200, 201, 204, etc.) Errors: All other responses including:
  • 3xx redirects (tracked separately for performance analysis)
  • 4xx client errors (404, 403, 401, etc.)
  • 5xx server errors (500, 502, 503, etc.)

Task-Level Error Tracking

Each crawl task stores detailed error information:
{
  "task_id": "abc-123",
  "url": "https://example.com/missing-page",
  "status_code": 404,
  "error": "non-success status code: 404",
  "response_time": 120,
  "retry_count": 0
}

Via Dashboard

The dashboard displays real-time broken link counts:
  • Failed Tasks Counter: Shows total broken links in a job
  • Task Status Filter: Filter to view only failed tasks
  • Error Details: Click any failed task to see the specific error

Via API

curl -X GET "https://adapt.beehivebusiness.builders/v1/jobs/{jobID}/tasks?status=failed" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response Format

{
  "tasks": [
    {
      "id": "task-123",
      "url": "https://example.com/broken",
      "status": "failed",
      "status_code": 404,
      "error": "non-success status code: 404",
      "source_type": "sitemap",
      "source_url": "https://example.com/sitemap.xml",
      "completed_at": "2024-01-15T10:30:00Z"
    }
  ],
  "total": 1,
  "page": 1,
  "limit": 50
}

Common Error Types

404 Not Found

Cause: Page deleted or URL typo Solution: Create 301 redirect to current page or remove link

Connection Timeout

Cause: Page takes longer than 30 seconds to respond Solution: Optimise server response time or increase cache TTL

SSL Certificate Errors

Cause: Expired or invalid SSL certificate Solution: Renew certificate and ensure proper chain configuration

DNS Resolution Failures

Cause: Domain doesn’t resolve or DNS propagation issues Solution: Check DNS records and wait for propagation Broken links are discovered through multiple sources:

Sitemap URLs

All URLs listed in sitemap.xml are automatically crawled and checked. When find_links is enabled, Adapt extracts links from:
  • Header navigation: <header> elements
  • Footer navigation: <footer> elements
  • Body content: Main content area
Links are categorised by location for better analysis:
// Links are organised by page section
result.Links = map[string][]string{
  "header": [...],
  "footer": [...],
  "body":   [...]
}
Adapt automatically skips hidden elements to focus on user-facing links:
// Skip links that are hidden from users
if isElementHidden(element) {
    continue
}
Hidden link detection checks for:
  • display: none or visibility: hidden styles
  • aria-hidden="true" attributes
  • Common hiding classes: .hidden, .d-none, .sr-only

Retry Logic

Temporary failures are automatically retried:
  • Transient Errors: Network issues, timeouts
  • Retry Count: Up to 3 attempts per URL
  • Exponential Backoff: Increasing delays between retries
  • Status Tracking: Each retry is logged separately
Permanent errors like 404s are not retried to avoid wasting resources.

Performance Impact

Broken link detection has minimal overhead:
  • No Extra Requests: Uses existing crawl data
  • Efficient Storage: Only failed tasks store error details
  • Real-time Updates: Status tracked as pages are crawled

Best Practices

Regular Monitoring

Schedule recurring crawls to catch broken links before users do

Fix at Source

Update links in your CMS rather than relying on redirects

Monitor External Links

Check links to external sites that might change or break

Review After Migrations

Always run a crawl after moving or restructuring content

Build docs developers (and LLMs) love