What is Crawlith?
Crawlith is a modular crawl intelligence engine built with Node.js and TypeScript. It combines high-performance website crawling with deep infrastructure auditing and interactive visualization to give you complete visibility into your site’s technical health. Whether you’re conducting SEO audits, mapping internal link structures, or analyzing TLS/DNS configurations, Crawlith provides production-grade tooling with persistent SQLite storage and snapshot-based history tracking.Key Features
BFS Crawling Algorithm
Efficiently crawl websites using breadth-first search with configurable depth and page limits. Respects robots.txt and rate limits by default.
Interactive Visualizations
Generate D3.js-powered link graphs and HTML reports to explore your site’s internal link structure visually.
Persistent SQLite Storage
All crawl data is stored locally in
~/.crawlith/crawlith.db with snapshot-based metrics and history tracking.Unified Export System
Export crawl results in multiple formats: JSON, CSV, Markdown, HTML reports, and interactive visualizations.
Critical Issue Detection
Automatically detect orphan pages, broken links, redirect chains, crawl traps, and soft 404s.
Web Dashboard
Launch an interactive React-based dashboard with
crawlith ui to review snapshots and visualize crawl data.Infrastructure Auditing
Run deep TLS, DNS, and security header checks with
crawlith audit to identify infrastructure vulnerabilities.Monorepo Architecture
Cleanly separated core library, CLI client, and web dashboard for maximum flexibility and extensibility.
Use Cases
- SEO Audits: Identify orphan pages, duplicate content clusters, and broken internal links
- Site Migrations: Compare crawl snapshots before and after migrations to detect issues
- Content Analysis: Analyze thin content, detect duplicate titles, and score on-page SEO
- Infrastructure Security: Audit TLS certificates, DNS configurations, and security headers
- Link Graph Analysis: Visualize internal link structures and calculate PageRank scores
Architecture
Crawlith uses a workspace-based monorepo structure:plugins/core: The heavy-lifting engine with database, graph algorithms, crawler, and security boundariesplugins/cli: The terminal user interface with commands likecrawl,page,ui,probe, and moreplugins/web: The React-based interactive dashboard frontend
~/.crawlith/crawlith.db, enabling snapshot-based comparisons and incremental crawling.
Next Steps
Installation
Get Crawlith installed and ready to use
Quickstart
Crawl your first site in under 2 minutes
CLI Reference
Explore all available commands and options
API Documentation
Use Crawlith programmatically in your projects