Core Library Overview

Introduction

The @crawlith/core package provides a programmatic API for crawling websites and analyzing their structure. Use it to build custom crawlers, integrate SEO analysis into your workflow, or perform automated audits.

Installation

Install the core library using your preferred package manager:

npm install @crawlith/core

Quick Start

Here’s a basic example of crawling a website and analyzing its structure:

import { crawl, Graph, calculateMetrics } from '@crawlith/core';

// Start a crawl
const snapshotId = await crawl('https://example.com', {
  limit: 100,
  depth: 3,
  concurrency: 5
});

console.log('Crawl complete! Snapshot ID:', snapshotId);

Core Concepts

Crawling

The crawler discovers pages by following links, respecting robots.txt, and building a graph of your site’s structure. Each crawl creates a snapshot stored in a SQLite database.

Graph Model

Crawlith represents your website as a directed graph:

Nodes represent pages (URLs)
Edges represent links between pages

This model enables powerful analysis like PageRank, authority scoring, and orphan detection.

Metrics

After crawling, run post-crawl metrics to calculate:

PageRank scores
HITS algorithm (authority/hub scores)
Orphan pages and near-orphans
Deep pages and crawl efficiency
Duplicate detection

Basic Usage Example

Complete workflow with crawling and metrics:

import { 
  crawl, 
  runPostCrawlMetrics,
  loadGraphFromSnapshot,
  calculateMetrics 
} from '@crawlith/core';

// Crawl the site
const snapshotId = await crawl('https://example.com', {
  limit: 500,
  depth: 4,
  concurrency: 10,
  detectSoft404: true,
  detectTraps: true
});

// Calculate metrics
runPostCrawlMetrics(snapshotId, 4);

// Load and analyze the graph
const graph = loadGraphFromSnapshot(snapshotId);
const metrics = calculateMetrics(graph, 4);

console.log('Total pages:', metrics.totalPages);
console.log('Orphan pages:', metrics.orphanPages.length);
console.log('Top authority pages:', metrics.topAuthorityPages);

Event-Driven Crawling

Monitor crawl progress in real-time using event context:

import { crawl } from '@crawlith/core';

const context = {
  emit: (event: any) => {
    switch (event.type) {
      case 'crawl:start':
        console.log('Crawling:', event.url);
        break;
      case 'crawl:success':
        console.log(`✓ ${event.url} (${event.status}) - ${event.durationMs}ms`);
        break;
      case 'crawl:error':
        console.error(`✗ ${event.url}:`, event.error);
        break;
      case 'crawl:limit-reached':
        console.log('Crawl limit reached:', event.limit);
        break;
    }
  }
};

const snapshotId = await crawl('https://example.com', {
  limit: 100,
  depth: 3
}, context);

TypeScript Support

The library is written in TypeScript and includes full type definitions. All interfaces and types are exported for your use:

import type { 
  CrawlOptions, 
  Graph, 
  GraphNode, 
  GraphEdge,
  Metrics,
  AuditResult 
} from '@crawlith/core';

Next Steps

Crawler API

Learn about crawl options and the Crawler class

Graph API

Work with the graph structure and analysis

Metrics API

Calculate and analyze site metrics

Audit API

Perform security and performance audits

Core Library

Database

Core Library Overview

Introduction

Installation

Quick Start

Core Concepts

Crawling

Graph Model

Metrics

Basic Usage Example

Event-Driven Crawling

TypeScript Support

Next Steps

Crawler API

Graph API

Metrics API

Audit API

Build docs developers (and LLMs) love

Core Library

Database

​Introduction

​Installation

​Quick Start

​Core Concepts

​Crawling

​Graph Model

​Metrics

​Basic Usage Example

​Event-Driven Crawling

​TypeScript Support

​Next Steps

Crawler API

Graph API

Metrics API

Audit API

Build docs developers (and LLMs) love

Introduction

Installation

Quick Start

Core Concepts

Crawling

Graph Model

Metrics

Basic Usage Example

Event-Driven Crawling

TypeScript Support

Next Steps