Skip to main content

Overview

Adapt monitors performance for every crawled page, capturing detailed timing information from DNS lookup through content transfer. This helps identify slow pages and optimise your site’s speed.

Captured Metrics

High-Level Timing

Every task tracks:
  • Response Time: Total time from request to response (milliseconds)
  • Status Code: HTTP response code (200, 404, 500, etc.)
  • Content Length: Response body size in bytes
  • Cache Status: CDN cache hit/miss status
{
  "url": "https://example.com/page",
  "response_time": 1245,
  "status_code": 200,
  "content_length": 52480,
  "cache_status": "MISS"
}

Detailed Performance Breakdown

Using Go’s httptrace package, Adapt captures granular timing:
trace := &httptrace.ClientTrace{
    DNSStart: func(info httptrace.DNSStartInfo) {
        dnsStartTime = time.Now()
    },
    DNSDone: func(info httptrace.DNSDoneInfo) {
        metrics.DNSLookupTime = time.Since(dnsStartTime).Milliseconds()
    },
    ConnectStart: func(network, addr string) {
        connectStartTime = time.Now()
    },
    ConnectDone: func(network, addr string, err error) {
        metrics.TCPConnectionTime = time.Since(connectStartTime).Milliseconds()
    },
    TLSHandshakeStart: func() {
        tlsStartTime = time.Now()
    },
    TLSHandshakeDone: func(state tls.ConnectionState, err error) {
        metrics.TLSHandshakeTime = time.Since(tlsStartTime).Milliseconds()
    },
    GotFirstResponseByte: func() {
        metrics.TTFB = time.Since(requestStartTime).Milliseconds()
    },
}

Performance Metrics Object

{
  "performance": {
    "dns_lookup_time": 12,        // DNS resolution
    "tcp_connection_time": 35,    // TCP handshake
    "tls_handshake_time": 78,     // SSL/TLS negotiation
    "ttfb": 245,                  // Time to first byte (server processing)
    "content_transfer_time": 142  // Response body download
  }
}

Performance Analysis

Identifying Bottlenecks

Slow DNS (>100ms)

Cause: DNS resolver issues or geo-distance Solution: Use faster DNS provider or reduce DNS lookups
{
  "dns_lookup_time": 234,  // ⚠️ Slow
  "tcp_connection_time": 28,
  "tls_handshake_time": 45,
  "ttfb": 120
}

Slow Connection (>200ms)

Cause: Geographic distance or network latency Solution: Use CDN with edge locations closer to users
{
  "dns_lookup_time": 15,
  "tcp_connection_time": 312,  // ⚠️ Slow
  "tls_handshake_time": 156,
  "ttfb": 89
}

Slow TTFB (>800ms)

Cause: Server-side processing or database queries Solution: Optimise backend code, add caching, scale servers
{
  "dns_lookup_time": 8,
  "tcp_connection_time": 24,
  "tls_handshake_time": 52,
  "ttfb": 1843  // ⚠️ Slow - origin overloaded
}

Large Content Transfer (>1000ms)

Cause: Large page size or slow bandwidth Solution: Compress assets, optimise images, enable HTTP/2
{
  "dns_lookup_time": 5,
  "tcp_connection_time": 18,
  "tls_handshake_time": 39,
  "ttfb": 142,
  "content_transfer_time": 2145  // ⚠️ Slow - large page
}

Cache Performance Comparison

Measure cache impact with before/after metrics:
{
  "url": "https://example.com/page",
  
  // First request (cache MISS)
  "response_time": 1450,
  "cache_status": "MISS",
  "performance": {
    "ttfb": 1205,              // Origin server processing
    "content_transfer_time": 245
  },
  
  // Second request (cache HIT)
  "second_response_time": 78,
  "second_cache_status": "HIT",
  "second_performance": {
    "ttfb": 45,                // CDN edge response
    "content_transfer_time": 33
  }
}
Improvement: 95% faster with cache (1450ms → 78ms)

Accessing Performance Data

Via API

Retrieve task performance metrics:
curl -X GET "https://adapt.beehivebusiness.builders/v1/jobs/{jobID}/tasks" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Via Dashboard

Job Statistics:
  • Average response time across all tasks
  • Slowest pages highlighted
  • Performance distribution histogram
Task Details:
  • Click any task to view full timing breakdown
  • Compare cache MISS vs HIT performance
  • View historical trends

Job-Level Performance Stats

Aggregate metrics calculated for each job:
{
  "job_id": "abc-123",
  "domain": "example.com",
  "duration_seconds": 145,
  "avg_time_per_task_seconds": 0.58,
  "total_tasks": 250,
  "completed_tasks": 250,
  "failed_tasks": 3
}

Calculating Average Performance

-- Computed during job processing
UPDATE jobs
SET duration_seconds = EXTRACT(EPOCH FROM (completed_at - started_at)),
    avg_time_per_task_seconds = duration_seconds / NULLIF(completed_tasks, 0)
WHERE id = $1;

Slow Page Detection

Automatic Flagging

Pages exceeding thresholds are flagged:
// Example thresholds (configurable)
const (
    SlowPageThreshold = 2000    // 2 seconds
    CriticalPageThreshold = 5000 // 5 seconds
)

if responseTime > CriticalPageThreshold {
    task.Warning = "Critical: Page extremely slow"
} else if responseTime > SlowPageThreshold {
    task.Warning = "Warning: Page slower than expected"
}

Performance-Based Prioritisation

Slow pages are re-crawled with higher priority:
// Track recent task performance per job
type JobPerformance struct {
    RecentTasks  []int64   // Last 5 response times
    CurrentBoost int       // Extra worker threads
}

// Scale workers for slow jobs
if avgResponseTime > 1000 {
    allocateBoostWorkers(jobID)
}

Performance Optimisation Features

HTTP/2 Support

transport := &http.Transport{
    MaxIdleConnsPerHost: 25,
    MaxConnsPerHost:     50,
    IdleConnTimeout:     120 * time.Second,
    ForceAttemptHTTP2:   true,  // Use HTTP/2 when available
}

Connection Pooling

Reuse connections for faster subsequent requests:
  • Max Idle Connections: 25 per host
  • Max Connections: 50 per host
  • Idle Timeout: 120 seconds

Compression Support

request.Headers.Set("Accept-Encoding", "gzip, deflate, br")
Automatic decompression of gzip, deflate, and Brotli responses.

Monitoring Best Practices

Baseline Metrics

Establish performance baselines before optimisation to measure improvements

Regular Checks

Schedule recurring crawls to track performance trends over time

Compare Cache States

Analyse MISS vs HIT metrics to validate caching strategy

Alert on Regression

Set up notifications when average response time degrades

Performance Budgets

Define acceptable performance thresholds:
MetricTargetMaximum
DNS Lookup<50ms100ms
Connection<100ms200ms
TTFB (cached)<100ms300ms
TTFB (uncached)<800ms2000ms
Total Response<1000ms3000ms

Real-Time Performance Tracking

Live Job Progress

WebSocket updates show performance as jobs run:
// Subscribe to job updates via Supabase Realtime
supabase
  .channel('job-progress')
  .on('postgres_changes', 
    { event: 'UPDATE', table: 'jobs', filter: `id=eq.${jobID}` },
    (payload) => {
      console.log('Avg response time:', payload.new.avg_time_per_task_seconds)
    }
  )
  .subscribe()

Task Completion Events

Track individual task performance in real-time:
supabase
  .channel('task-updates')
  .on('postgres_changes',
    { event: 'UPDATE', table: 'tasks', filter: `job_id=eq.${jobID}` },
    (payload) => {
      if (payload.new.status === 'completed') {
        console.log(`${payload.new.path}: ${payload.new.response_time}ms`)
      }
    }
  )
  .subscribe()

Performance Analysis Tips

Identify Patterns

Geographic Issues: All pages slow → Check CDN coverage Specific Pages: Some pages slow → Profile those pages Time-Based: Slow at specific times → Check server load Cache-Related: MISS slow, HIT fast → Optimise cache warming

Prioritise Fixes

  1. High-Traffic Pages: Fix homepage and popular pages first
  2. Worst Offenders: Target pages >3 seconds
  3. Quick Wins: Enable caching for uncached pages
  4. Infrastructure: Upgrade if all pages are consistently slow

Build docs developers (and LLMs) love