Skip to main content

Usage

crawlith sites [options]
The sites command displays a summary of all sites tracked in the local Crawlith database, including crawl statistics and health scores. This is useful for managing multiple site audits and tracking crawl history.

Options

--format
string
default:"pretty"
Output format. Options: pretty, json.
crawlith sites --format json

What It Shows

For each tracked site, the command displays:
  • Domain: The site’s domain name
  • Snapshots: Total number of crawl snapshots stored
  • Last Crawl: Date of the most recent crawl
  • Pages: Number of pages in the latest snapshot
  • Health: Overall site health score (0-100)

Output Formats

Pretty (Default)

example.com
  Snapshots: 5
  Last Crawl: 2024-03-01
  Pages: 1,234
  Health: 85

blog.example.com
  Snapshots: 3
  Last Crawl: 2024-02-28
  Pages: 456
  Health: 72

shop.example.com
  Snapshots: 1
  Last Crawl: 2024-02-15
  Pages: 2,789
  Health: 91

JSON

[
  {
    "domain": "example.com",
    "snapshots": 5,
    "lastCrawl": "2024-03-01T10:30:00.000Z",
    "pages": 1234,
    "health": 85
  },
  {
    "domain": "blog.example.com",
    "snapshots": 3,
    "lastCrawl": "2024-02-28T14:20:00.000Z",
    "pages": 456,
    "health": 72
  },
  {
    "domain": "shop.example.com",
    "snapshots": 1,
    "lastCrawl": "2024-02-15T09:15:00.000Z",
    "pages": 2789,
    "health": 91
  }
]

Examples

List All Sites

crawlith sites

JSON Output for Scripting

crawlith sites --format json

Get Site Count

crawlith sites --format json | jq 'length'

Find Sites Needing Re-crawl

# Sites not crawled in last 7 days
crawlith sites --format json | jq -r '.[] | select(
  (.lastCrawl | fromdateiso8601) < (now - 604800)
) | .domain'

Filter by Health Score

# Sites with health score below 70
crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain'

Export Sites List

crawlith sites --format json > sites-inventory.json

Monitor Crawl Coverage

# Total pages across all sites
crawlith sites --format json | jq '[.[].pages] | add'

Automated Re-crawl Script

#!/bin/bash
# Re-crawl all sites with health below 80
for domain in $(crawlith sites --format json | jq -r '.[] | select(.health < 80) | .domain'); do
  echo "Re-crawling $domain..."
  crawlith crawl "https://$domain" --incremental
done

Health Score Interpretation

Health scores are color-coded in the pretty output:
  • Green (75-100): Good health, minor issues or none
  • Yellow (50-74): Fair health, several issues to address
  • Red (0-49): Poor health, significant problems detected
Health scores are calculated based on:
  • SEO optimization
  • Link structure quality
  • Duplicate content
  • Orphan pages
  • Broken links
  • Content quality
  • Technical SEO factors

Use Cases

Multi-Site Management

Track all your client sites in one place:
crawlith sites

Reporting Dashboard

Generate JSON for custom dashboards:
crawlith sites --format json | curl -X POST https://dashboard.example.com/api/sites -d @-

Crawl Scheduling

Identify sites that need fresh crawls:
# Sites with only 1 snapshot (initial crawl)
crawlith sites --format json | jq -r '.[] | select(.snapshots == 1) | .domain'

Historical Tracking

Monitor snapshot growth over time:
crawlith sites --format json | jq '.[] | {domain, snapshots}'

Quality Assurance

Find sites with declining health:
crawlith sites --format json | jq -r '.[] | select(.health < 75) | "\(.domain): \(.health)"'

Integration Examples

Daily Report Email

#!/bin/bash
crawlith sites > daily-report.txt
mail -s "Daily Crawlith Report" [email protected] < daily-report.txt

Slack Notification

#!/bin/bash
low_health=$(crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain' | wc -l)
if [ $low_health -gt 0 ]; then
  curl -X POST $SLACK_WEBHOOK -d "{\"text\": \"⚠️ $low_health sites have health scores below 70\"}"
fi

Monitoring Dashboard

// Fetch sites data for dashboard
const { exec } = require('child_process');

exec('crawlith sites --format json', (error, stdout) => {
  const sites = JSON.parse(stdout);
  const avgHealth = sites.reduce((sum, s) => sum + s.health, 0) / sites.length;
  console.log(`Average Health: ${avgHealth.toFixed(1)}`);
});

CSV Export for Spreadsheets

#!/bin/bash
echo "Domain,Snapshots,Last Crawl,Pages,Health" > sites.csv
crawlith sites --format json | jq -r '.[] | [.domain, .snapshots, .lastCrawl, .pages, .health] | @csv' >> sites.csv

Empty Database

If no sites have been crawled yet:
No sites found. Run a crawl first to add sites.
To add your first site:
crawlith crawl https://example.com
crawlith sites

Snapshot Management

The sites list shows snapshot count, which indicates:
  • 1 snapshot: Initial crawl only
  • Multiple snapshots: Historical crawl data available
  • Many snapshots: Long-term tracking, good for trend analysis
To manage snapshots, use the clean command:
# Remove old snapshots for a site
crawlith clean https://example.com --snapshot 1
The sites command reads from the local SQLite database. It does not make any network requests and executes instantly regardless of how many sites are tracked.
Combine with jq for powerful filtering:
# Sites with more than 1000 pages
crawlith sites --format json | jq '.[] | select(.pages > 1000)'
The “Last Crawl” date shows when the snapshot was created, not when it was last viewed in the UI. Re-crawl sites periodically to keep data fresh.
  • crawl - Crawl a new site or update existing one
  • ui - View detailed data for a specific site
  • clean - Remove sites or snapshots from database

Build docs developers (and LLMs) love