Usage
The sites command displays a summary of all sites tracked in the local Crawlith database, including crawl statistics and health scores. This is useful for managing multiple site audits and tracking crawl history.
Options
Output format. Options: pretty, json.crawlith sites --format json
What It Shows
For each tracked site, the command displays:
- Domain: The site’s domain name
- Snapshots: Total number of crawl snapshots stored
- Last Crawl: Date of the most recent crawl
- Pages: Number of pages in the latest snapshot
- Health: Overall site health score (0-100)
Pretty (Default)
example.com
Snapshots: 5
Last Crawl: 2024-03-01
Pages: 1,234
Health: 85
blog.example.com
Snapshots: 3
Last Crawl: 2024-02-28
Pages: 456
Health: 72
shop.example.com
Snapshots: 1
Last Crawl: 2024-02-15
Pages: 2,789
Health: 91
JSON
[
{
"domain": "example.com",
"snapshots": 5,
"lastCrawl": "2024-03-01T10:30:00.000Z",
"pages": 1234,
"health": 85
},
{
"domain": "blog.example.com",
"snapshots": 3,
"lastCrawl": "2024-02-28T14:20:00.000Z",
"pages": 456,
"health": 72
},
{
"domain": "shop.example.com",
"snapshots": 1,
"lastCrawl": "2024-02-15T09:15:00.000Z",
"pages": 2789,
"health": 91
}
]
Examples
List All Sites
JSON Output for Scripting
crawlith sites --format json
Get Site Count
crawlith sites --format json | jq 'length'
Find Sites Needing Re-crawl
# Sites not crawled in last 7 days
crawlith sites --format json | jq -r '.[] | select(
(.lastCrawl | fromdateiso8601) < (now - 604800)
) | .domain'
Filter by Health Score
# Sites with health score below 70
crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain'
Export Sites List
crawlith sites --format json > sites-inventory.json
Monitor Crawl Coverage
# Total pages across all sites
crawlith sites --format json | jq '[.[].pages] | add'
Automated Re-crawl Script
#!/bin/bash
# Re-crawl all sites with health below 80
for domain in $(crawlith sites --format json | jq -r '.[] | select(.health < 80) | .domain'); do
echo "Re-crawling $domain..."
crawlith crawl "https://$domain" --incremental
done
Health Score Interpretation
Health scores are color-coded in the pretty output:
- Green (75-100): Good health, minor issues or none
- Yellow (50-74): Fair health, several issues to address
- Red (0-49): Poor health, significant problems detected
Health scores are calculated based on:
- SEO optimization
- Link structure quality
- Duplicate content
- Orphan pages
- Broken links
- Content quality
- Technical SEO factors
Use Cases
Multi-Site Management
Track all your client sites in one place:
Reporting Dashboard
Generate JSON for custom dashboards:
crawlith sites --format json | curl -X POST https://dashboard.example.com/api/sites -d @-
Crawl Scheduling
Identify sites that need fresh crawls:
# Sites with only 1 snapshot (initial crawl)
crawlith sites --format json | jq -r '.[] | select(.snapshots == 1) | .domain'
Historical Tracking
Monitor snapshot growth over time:
crawlith sites --format json | jq '.[] | {domain, snapshots}'
Quality Assurance
Find sites with declining health:
crawlith sites --format json | jq -r '.[] | select(.health < 75) | "\(.domain): \(.health)"'
Integration Examples
Daily Report Email
#!/bin/bash
crawlith sites > daily-report.txt
mail -s "Daily Crawlith Report" [email protected] < daily-report.txt
Slack Notification
#!/bin/bash
low_health=$(crawlith sites --format json | jq -r '.[] | select(.health < 70) | .domain' | wc -l)
if [ $low_health -gt 0 ]; then
curl -X POST $SLACK_WEBHOOK -d "{\"text\": \"⚠️ $low_health sites have health scores below 70\"}"
fi
Monitoring Dashboard
// Fetch sites data for dashboard
const { exec } = require('child_process');
exec('crawlith sites --format json', (error, stdout) => {
const sites = JSON.parse(stdout);
const avgHealth = sites.reduce((sum, s) => sum + s.health, 0) / sites.length;
console.log(`Average Health: ${avgHealth.toFixed(1)}`);
});
CSV Export for Spreadsheets
#!/bin/bash
echo "Domain,Snapshots,Last Crawl,Pages,Health" > sites.csv
crawlith sites --format json | jq -r '.[] | [.domain, .snapshots, .lastCrawl, .pages, .health] | @csv' >> sites.csv
Empty Database
If no sites have been crawled yet:
No sites found. Run a crawl first to add sites.
To add your first site:
crawlith crawl https://example.com
crawlith sites
Snapshot Management
The sites list shows snapshot count, which indicates:
- 1 snapshot: Initial crawl only
- Multiple snapshots: Historical crawl data available
- Many snapshots: Long-term tracking, good for trend analysis
To manage snapshots, use the clean command:
# Remove old snapshots for a site
crawlith clean https://example.com --snapshot 1
The sites command reads from the local SQLite database. It does not make any network requests and executes instantly regardless of how many sites are tracked.
Combine with jq for powerful filtering:# Sites with more than 1000 pages
crawlith sites --format json | jq '.[] | select(.pages > 1000)'
The “Last Crawl” date shows when the snapshot was created, not when it was last viewed in the UI. Re-crawl sites periodically to keep data fresh.
crawl - Crawl a new site or update existing one
ui - View detailed data for a specific site
clean - Remove sites or snapshots from database