Skip to main content
Crawlith Hero Light

What is Crawlith?

Crawlith is a modular crawl intelligence engine built with Node.js and TypeScript. It combines high-performance website crawling with deep infrastructure auditing and interactive visualization to give you complete visibility into your site’s technical health. Whether you’re conducting SEO audits, mapping internal link structures, or analyzing TLS/DNS configurations, Crawlith provides production-grade tooling with persistent SQLite storage and snapshot-based history tracking.

Key Features

BFS Crawling Algorithm

Efficiently crawl websites using breadth-first search with configurable depth and page limits. Respects robots.txt and rate limits by default.

Interactive Visualizations

Generate D3.js-powered link graphs and HTML reports to explore your site’s internal link structure visually.

Persistent SQLite Storage

All crawl data is stored locally in ~/.crawlith/crawlith.db with snapshot-based metrics and history tracking.

Unified Export System

Export crawl results in multiple formats: JSON, CSV, Markdown, HTML reports, and interactive visualizations.

Critical Issue Detection

Automatically detect orphan pages, broken links, redirect chains, crawl traps, and soft 404s.

Web Dashboard

Launch an interactive React-based dashboard with crawlith ui to review snapshots and visualize crawl data.

Infrastructure Auditing

Run deep TLS, DNS, and security header checks with crawlith audit to identify infrastructure vulnerabilities.

Monorepo Architecture

Cleanly separated core library, CLI client, and web dashboard for maximum flexibility and extensibility.

Use Cases

  • SEO Audits: Identify orphan pages, duplicate content clusters, and broken internal links
  • Site Migrations: Compare crawl snapshots before and after migrations to detect issues
  • Content Analysis: Analyze thin content, detect duplicate titles, and score on-page SEO
  • Infrastructure Security: Audit TLS certificates, DNS configurations, and security headers
  • Link Graph Analysis: Visualize internal link structures and calculate PageRank scores

Architecture

Crawlith uses a workspace-based monorepo structure:
  • plugins/core: The heavy-lifting engine with database, graph algorithms, crawler, and security boundaries
  • plugins/cli: The terminal user interface with commands like crawl, page, ui, probe, and more
  • plugins/web: The React-based interactive dashboard frontend
All data is persisted locally in an SQLite database at ~/.crawlith/crawlith.db, enabling snapshot-based comparisons and incremental crawling.

Next Steps

Installation

Get Crawlith installed and ready to use

Quickstart

Crawl your first site in under 2 minutes

CLI Reference

Explore all available commands and options

API Documentation

Use Crawlith programmatically in your projects

Build docs developers (and LLMs) love