sites

Usage

crawlith sites [options]

The sites command displays a summary of all sites tracked in the local Crawlith database, including crawl statistics and health scores. This is useful for managing multiple site audits and tracking crawl history.

Options

--format

string

default:"pretty"

Output format. Options: pretty, json.

crawlith sites --format json

What It Shows

For each tracked site, the command displays:

Domain: The site’s domain name
Snapshots: Total number of crawl snapshots stored
Last Crawl: Date of the most recent crawl
Pages: Number of pages in the latest snapshot
Health: Overall site health score (0-100)

Output Formats

Pretty (Default)

example.com
  Snapshots: 5
  Last Crawl: 2024-03-01
  Pages: 1,234
  Health: 85

blog.example.com
  Snapshots: 3
  Last Crawl: 2024-02-28
  Pages: 456
  Health: 72

shop.example.com
  Snapshots: 1
  Last Crawl: 2024-02-15
  Pages: 2,789
  Health: 91

JSON

[
  {
    "domain": "example.com",
    "snapshots": 5,
    "lastCrawl": "2024-03-01T10:30:00.000Z",
    "pages": 1234,
    "health": 85
  },
  {
    "domain": "blog.example.com",
    "snapshots": 3,
    "lastCrawl": "2024-02-28T14:20:00.000Z",
    "pages": 456,
    "health": 72
  },
  {
    "domain": "shop.example.com",
    "snapshots": 1,
    "lastCrawl": "2024-02-15T09:15:00.000Z",
    "pages": 2789,
    "health": 91
  }
]

Examples

List All Sites

crawlith sites

JSON Output for Scripting

crawlith sites --format json

Get Site Count

crawlith sites --format json | jq 'length'

Find Sites Needing Re-crawl

# Sites not crawled in last 7 days
crawlith sites --format json | jq -r '.[] | select(
  (.lastCrawl | fromdateiso8601) < (now - 604800)
) | .domain'

Filter by Health Score

# Sites with health score below 70
crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain'

Export Sites List

crawlith sites --format json > sites-inventory.json

Monitor Crawl Coverage

# Total pages across all sites
crawlith sites --format json | jq '[.[].pages] | add'

Automated Re-crawl Script

#!/bin/bash
# Re-crawl all sites with health below 80
for domain in $(crawlith sites --format json | jq -r '.[] | select(.health < 80) | .domain'); do
  echo "Re-crawling $domain..."
  crawlith crawl "https://$domain" --incremental
done

Health Score Interpretation

Health scores are color-coded in the pretty output:

Green (75-100): Good health, minor issues or none
Yellow (50-74): Fair health, several issues to address
Red (0-49): Poor health, significant problems detected

Health scores are calculated based on:

SEO optimization
Link structure quality
Duplicate content
Orphan pages
Broken links
Content quality
Technical SEO factors

Use Cases

Multi-Site Management

Track all your client sites in one place:

crawlith sites

Reporting Dashboard

Generate JSON for custom dashboards:

crawlith sites --format json | curl -X POST https://dashboard.example.com/api/sites -d @-

Crawl Scheduling

Identify sites that need fresh crawls:

# Sites with only 1 snapshot (initial crawl)
crawlith sites --format json | jq -r '.[] | select(.snapshots == 1) | .domain'

Historical Tracking

Monitor snapshot growth over time:

crawlith sites --format json | jq '.[] | {domain, snapshots}'

Quality Assurance

Find sites with declining health:

crawlith sites --format json | jq -r '.[] | select(.health < 75) | "\(.domain): \(.health)"'

Integration Examples

Daily Report Email

#!/bin/bash
crawlith sites > daily-report.txt
mail -s "Daily Crawlith Report" [email protected] < daily-report.txt

Slack Notification

#!/bin/bash
low_health=$(crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain' | wc -l)
if [ $low_health -gt 0 ]; then
  curl -X POST $SLACK_WEBHOOK -d "{\"text\": \"⚠️ $low_health sites have health scores below 70\"}"
fi

Monitoring Dashboard

// Fetch sites data for dashboard
const { exec } = require('child_process');

exec('crawlith sites --format json', (error, stdout) => {
  const sites = JSON.parse(stdout);
  const avgHealth = sites.reduce((sum, s) => sum + s.health, 0) / sites.length;
  console.log(`Average Health: ${avgHealth.toFixed(1)}`);
});

CSV Export for Spreadsheets

#!/bin/bash
echo "Domain,Snapshots,Last Crawl,Pages,Health" > sites.csv
crawlith sites --format json | jq -r '.[] | [.domain, .snapshots, .lastCrawl, .pages, .health] | @csv' >> sites.csv

Empty Database

If no sites have been crawled yet:

No sites found. Run a crawl first to add sites.

To add your first site:

crawlith crawl https://example.com
crawlith sites

Snapshot Management

The sites list shows snapshot count, which indicates:

1 snapshot: Initial crawl only
Multiple snapshots: Historical crawl data available
Many snapshots: Long-term tracking, good for trend analysis

To manage snapshots, use the clean command:

# Remove old snapshots for a site
crawlith clean https://example.com --snapshot 1

The sites command reads from the local SQLite database. It does not make any network requests and executes instantly regardless of how many sites are tracked.

Combine with jq for powerful filtering:

# Sites with more than 1000 pages
crawlith sites --format json | jq '.[] | select(.pages > 1000)'

The “Last Crawl” date shows when the snapshot was created, not when it was last viewed in the UI. Re-crawl sites periodically to keep data fresh.

crawl - Crawl a new site or update existing one
ui - View detailed data for a specific site
clean - Remove sites or snapshots from database

Get Started

Core Commands

Features

Guides

Usage

Options

What It Shows

Output Formats

Pretty (Default)

JSON

Examples

List All Sites

JSON Output for Scripting

Get Site Count

Find Sites Needing Re-crawl

Filter by Health Score

Export Sites List

Monitor Crawl Coverage

Automated Re-crawl Script

Health Score Interpretation

Use Cases

Multi-Site Management

Reporting Dashboard

Crawl Scheduling

Historical Tracking

Quality Assurance

Integration Examples

Daily Report Email

Slack Notification

Monitoring Dashboard

CSV Export for Spreadsheets

Empty Database

Snapshot Management

Build docs developers (and LLMs) love

Get Started

Core Commands

Features

Guides

​Usage

​Options

​What It Shows

​Output Formats

​Pretty (Default)

​JSON

​Examples

​List All Sites

​JSON Output for Scripting

​Get Site Count

​Find Sites Needing Re-crawl

​Filter by Health Score

​Export Sites List

​Monitor Crawl Coverage

​Automated Re-crawl Script

​Health Score Interpretation

​Use Cases

​Multi-Site Management

​Reporting Dashboard

​Crawl Scheduling

​Historical Tracking

​Quality Assurance

​Integration Examples

​Daily Report Email

​Slack Notification

​Monitoring Dashboard

​CSV Export for Spreadsheets

​Empty Database

​Snapshot Management

​Related Commands

Build docs developers (and LLMs) love

Usage

Options

What It Shows

Output Formats

Pretty (Default)

JSON

Examples

List All Sites

JSON Output for Scripting

Get Site Count

Find Sites Needing Re-crawl

Filter by Health Score

Export Sites List

Monitor Crawl Coverage

Automated Re-crawl Script

Health Score Interpretation

Use Cases

Multi-Site Management

Reporting Dashboard

Crawl Scheduling

Historical Tracking

Quality Assurance

Integration Examples

Daily Report Email

Slack Notification

Monitoring Dashboard

CSV Export for Spreadsheets

Empty Database

Snapshot Management

Related Commands