Introduction

What is Crawlith?

Crawlith is a modular crawl intelligence engine built with Node.js and TypeScript. It combines high-performance website crawling with deep infrastructure auditing and interactive visualization to give you complete visibility into your site’s technical health. Whether you’re conducting SEO audits, mapping internal link structures, or analyzing TLS/DNS configurations, Crawlith provides production-grade tooling with persistent SQLite storage and snapshot-based history tracking.

Key Features

BFS Crawling Algorithm

Efficiently crawl websites using breadth-first search with configurable depth and page limits. Respects robots.txt and rate limits by default.

Interactive Visualizations

Generate D3.js-powered link graphs and HTML reports to explore your site’s internal link structure visually.

Persistent SQLite Storage

All crawl data is stored locally in ~/.crawlith/crawlith.db with snapshot-based metrics and history tracking.

Unified Export System

Export crawl results in multiple formats: JSON, CSV, Markdown, HTML reports, and interactive visualizations.

Critical Issue Detection

Automatically detect orphan pages, broken links, redirect chains, crawl traps, and soft 404s.

Web Dashboard

Launch an interactive React-based dashboard with crawlith ui to review snapshots and visualize crawl data.

Infrastructure Auditing

Run deep TLS, DNS, and security header checks with crawlith audit to identify infrastructure vulnerabilities.

Monorepo Architecture

Cleanly separated core library, CLI client, and web dashboard for maximum flexibility and extensibility.

Use Cases

SEO Audits: Identify orphan pages, duplicate content clusters, and broken internal links
Site Migrations: Compare crawl snapshots before and after migrations to detect issues
Content Analysis: Analyze thin content, detect duplicate titles, and score on-page SEO
Infrastructure Security: Audit TLS certificates, DNS configurations, and security headers
Link Graph Analysis: Visualize internal link structures and calculate PageRank scores

Architecture

Crawlith uses a workspace-based monorepo structure:

plugins/core: The heavy-lifting engine with database, graph algorithms, crawler, and security boundaries
plugins/cli: The terminal user interface with commands like crawl, page, ui, probe, and more
plugins/web: The React-based interactive dashboard frontend

All data is persisted locally in an SQLite database at ~/.crawlith/crawlith.db, enabling snapshot-based comparisons and incremental crawling.

Next Steps

Installation

Get Crawlith installed and ready to use

Quickstart

Crawl your first site in under 2 minutes

CLI Reference

Explore all available commands and options

API Documentation

Use Crawlith programmatically in your projects

Get Started

Core Commands

Features

Guides

What is Crawlith?

Key Features

BFS Crawling Algorithm

Interactive Visualizations

Persistent SQLite Storage

Unified Export System

Critical Issue Detection

Web Dashboard

Infrastructure Auditing

Monorepo Architecture

Use Cases

Architecture

Next Steps

Installation

Quickstart

CLI Reference

API Documentation

Build docs developers (and LLMs) love

Get Started

Core Commands

Features

Guides

​What is Crawlith?

​Key Features

BFS Crawling Algorithm

Interactive Visualizations

Persistent SQLite Storage

Unified Export System

Critical Issue Detection

Web Dashboard

Infrastructure Auditing

Monorepo Architecture

​Use Cases

​Architecture

​Next Steps

Installation

Quickstart

CLI Reference

API Documentation

Build docs developers (and LLMs) love

What is Crawlith?

Key Features

Use Cases

Architecture

Next Steps