Quickstart

Crawl Your First Site

Let’s crawl a website and generate a full report with link graphs, issue detection, and exports.

Run a basic crawl

Start by crawling a small site with default settings:

crawlith crawl https://example.com

You’ll see real-time progress as Crawlith discovers and fetches pages:

🚀 Starting Crawlith Site Crawler
Target: https://example.com
Limits: Pages: 500 | Depth: 5

🔍 Fetching robots.txt... Done
📄 Crawling pages... [50/500] [Depth: 2/5]
✅ Crawl complete.
🔍 Detecting duplicates... Done
🧩 Clustering content... Done
📊 Calculating final report metrics... Done

Review the results

After the crawl completes, Crawlith displays a comprehensive report:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Crawl Report for https://example.com
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📈 Overview
  Pages Crawled:        247
  Total Links:          1,234
  Avg Links/Page:       5.0
  Max Depth Reached:    4

🏥 Health Score: 87.5

⚠️  Issues Detected
  Broken Links:         3
  Redirect Chains:      2
  Orphan Pages:         0
  Soft 404s:            1

💾 run `crawlith ui` to view the full report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Launch the Web Dashboard

For interactive visualization, launch the web UI:

crawlith ui

This opens a React-based dashboard in your browser where you can:

Browse all crawl snapshots by site
View interactive D3.js link graphs
Drill down into individual pages and their connections
Compare metrics across snapshots

Common Crawl Options

Customize your crawl with these frequently used options:

Limit Pages and Depth

crawlith crawl https://example.com --limit 100 --depth 3

--limit: Maximum number of pages to crawl (default: 500)
--depth: Maximum click depth from the starting URL (default: 5)

Enable Issue Detection

crawlith crawl https://example.com --detect-soft404 --detect-traps --orphans

--detect-soft404: Identify pages that return 200 but are actually error pages
--detect-traps: Detect infinite URL parameter spaces (e.g., calendars)
--orphans: Find pages with no internal links pointing to them

Export Results

crawlith crawl https://example.com --export json,csv,html,visualize

Exports are saved to ./crawlith-reports/<domain>/:

JSON: Complete graph data with nodes and edges
CSV: Tabular page data for spreadsheet analysis
HTML: Standalone HTML report
Visualize: Interactive D3.js link graph

Use --output <path> to customize the export directory:

crawlith crawl https://example.com --export json --output ./reports

Analyze a Single Page

For quick on-page SEO analysis without a full crawl:

crawlith page https://example.com/about

This analyzes the page from your local crawl database. Add --live to fetch fresh data:

crawlith page https://example.com/about --live

Output includes:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📄 Page Analysis: https://example.com/about
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🏥 Health Score: 92.3

📝 SEO Signals
  Title:              About Us - Example Corp (25 chars) ✓
  Meta Description:   Learn about our mission... (145 chars) ✓
  H1 Count:           1 ✓
  Canonical:          https://example.com/about
  Indexable:          Yes

📊 Content Analysis
  Word Count:         847
  Unique Sentences:   42
  Text/HTML Ratio:    0.65
  Thin Content Score: 15 (Good)

🔗 Link Analysis
  Internal Links:     12
  External Links:     3
  External Ratio:     20.0%

♿ Accessibility
  Total Images:       5
  Missing Alt Text:   0 ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Incremental Crawling

For large sites, use incremental crawling to re-crawl efficiently:

crawlith crawl https://example.com --incremental

This uses ETag and Last-Modified headers from the previous snapshot to skip unchanged pages, dramatically reducing crawl time.

Infrastructure Auditing

Run deep infrastructure checks:

crawlith audit https://example.com

This performs:

TLS Analysis: Certificate validity, protocol versions, cipher suites
DNS Checks: Resolution time, record types, DNSSEC
Security Headers: HSTS, CSP, X-Frame-Options, etc.
Transport Analysis: HTTP/2 support, compression, response times

Example output:

🔒 TLS Certificate
  Valid:              Yes ✓
  Issuer:             Let's Encrypt
  Expires:            2026-05-15
  Protocol:           TLSv1.3 ✓

🛡️  Security Headers
  Strict-Transport-Security: max-age=31536000 ✓
  Content-Security-Policy:   Present ✓
  X-Frame-Options:           DENY ✓
  X-Content-Type-Options:    nosniff ✓

📡 DNS Performance
  Resolution Time:    23ms ✓
  IPv6 Support:       Yes

Next Steps

CLI Reference

Explore all crawl options and advanced flags

Page Analysis

Deep dive into on-page SEO analysis features

Web Dashboard

Learn how to navigate the interactive UI

Export Formats

Understand all available export options

All crawl data is persisted in ~/.crawlith/crawlith.db. Use crawlith sites to list all tracked sites and crawlith clean to remove old snapshots.

Get Started

Core Commands

Features

Guides

Crawl Your First Site

Common Crawl Options

Limit Pages and Depth

Enable Issue Detection

Export Results

Analyze a Single Page

Incremental Crawling

Infrastructure Auditing

Next Steps

CLI Reference

Page Analysis

Web Dashboard

Export Formats

Build docs developers (and LLMs) love

Get Started

Core Commands

Features

Guides

​Crawl Your First Site

​Common Crawl Options

​Limit Pages and Depth

​Enable Issue Detection

​Export Results

​Analyze a Single Page

​Incremental Crawling

​Infrastructure Auditing

​Next Steps

CLI Reference

Page Analysis

Web Dashboard

Export Formats

Build docs developers (and LLMs) love

Crawl Your First Site

Common Crawl Options

Limit Pages and Depth

Enable Issue Detection

Export Results

Analyze a Single Page

Incremental Crawling

Infrastructure Auditing

Next Steps