Crawl Your First Site
Let’s crawl a website and generate a full report with link graphs, issue detection, and exports.
Run a basic crawl
Start by crawling a small site with default settings:crawlith crawl https://example.com
You’ll see real-time progress as Crawlith discovers and fetches pages:🚀 Starting Crawlith Site Crawler
Target: https://example.com
Limits: Pages: 500 | Depth: 5
🔍 Fetching robots.txt... Done
📄 Crawling pages... [50/500] [Depth: 2/5]
✅ Crawl complete.
🔍 Detecting duplicates... Done
🧩 Clustering content... Done
📊 Calculating final report metrics... Done
Review the results
After the crawl completes, Crawlith displays a comprehensive report:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Crawl Report for https://example.com
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📈 Overview
Pages Crawled: 247
Total Links: 1,234
Avg Links/Page: 5.0
Max Depth Reached: 4
🏥 Health Score: 87.5
⚠️ Issues Detected
Broken Links: 3
Redirect Chains: 2
Orphan Pages: 0
Soft 404s: 1
💾 run `crawlith ui` to view the full report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Launch the Web Dashboard
For interactive visualization, launch the web UI:This opens a React-based dashboard in your browser where you can:
- Browse all crawl snapshots by site
- View interactive D3.js link graphs
- Drill down into individual pages and their connections
- Compare metrics across snapshots
Common Crawl Options
Customize your crawl with these frequently used options:
Limit Pages and Depth
crawlith crawl https://example.com --limit 100 --depth 3
--limit: Maximum number of pages to crawl (default: 500)
--depth: Maximum click depth from the starting URL (default: 5)
Enable Issue Detection
crawlith crawl https://example.com --detect-soft404 --detect-traps --orphans
--detect-soft404: Identify pages that return 200 but are actually error pages
--detect-traps: Detect infinite URL parameter spaces (e.g., calendars)
--orphans: Find pages with no internal links pointing to them
Export Results
crawlith crawl https://example.com --export json,csv,html,visualize
Exports are saved to ./crawlith-reports/<domain>/:
- JSON: Complete graph data with nodes and edges
- CSV: Tabular page data for spreadsheet analysis
- HTML: Standalone HTML report
- Visualize: Interactive D3.js link graph
Use --output <path> to customize the export directory:crawlith crawl https://example.com --export json --output ./reports
Analyze a Single Page
For quick on-page SEO analysis without a full crawl:
crawlith page https://example.com/about
This analyzes the page from your local crawl database. Add --live to fetch fresh data:
crawlith page https://example.com/about --live
Output includes:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📄 Page Analysis: https://example.com/about
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏥 Health Score: 92.3
📝 SEO Signals
Title: About Us - Example Corp (25 chars) ✓
Meta Description: Learn about our mission... (145 chars) ✓
H1 Count: 1 ✓
Canonical: https://example.com/about
Indexable: Yes
📊 Content Analysis
Word Count: 847
Unique Sentences: 42
Text/HTML Ratio: 0.65
Thin Content Score: 15 (Good)
🔗 Link Analysis
Internal Links: 12
External Links: 3
External Ratio: 20.0%
♿ Accessibility
Total Images: 5
Missing Alt Text: 0 ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Incremental Crawling
For large sites, use incremental crawling to re-crawl efficiently:
crawlith crawl https://example.com --incremental
This uses ETag and Last-Modified headers from the previous snapshot to skip unchanged pages, dramatically reducing crawl time.
Infrastructure Auditing
Run deep infrastructure checks:
crawlith audit https://example.com
This performs:
- TLS Analysis: Certificate validity, protocol versions, cipher suites
- DNS Checks: Resolution time, record types, DNSSEC
- Security Headers: HSTS, CSP, X-Frame-Options, etc.
- Transport Analysis: HTTP/2 support, compression, response times
Example output:
🔒 TLS Certificate
Valid: Yes ✓
Issuer: Let's Encrypt
Expires: 2026-05-15
Protocol: TLSv1.3 ✓
🛡️ Security Headers
Strict-Transport-Security: max-age=31536000 ✓
Content-Security-Policy: Present ✓
X-Frame-Options: DENY ✓
X-Content-Type-Options: nosniff ✓
📡 DNS Performance
Resolution Time: 23ms ✓
IPv6 Support: Yes
Next Steps
CLI Reference
Explore all crawl options and advanced flags
Page Analysis
Deep dive into on-page SEO analysis features
Web Dashboard
Learn how to navigate the interactive UI
Export Formats
Understand all available export options
All crawl data is persisted in ~/.crawlith/crawlith.db. Use crawlith sites to list all tracked sites and crawlith clean to remove old snapshots.