Skip to main content

Usage

crawlith clean <url> [options]
The clean command removes crawl data from the database and/or exported files from disk. This is useful for managing disk space, removing outdated data, or completely resetting a site’s crawl history.
This is a destructive operation that permanently deletes data. Use with caution and ensure you have backups if needed.

Arguments

url
string
required
URL or domain of the site to clean.
crawlith clean https://example.com
# or just the domain
crawlith clean example.com

Options

--exports
boolean
Clean exported files only. Removes all exports from the output directory while preserving database records.
crawlith clean https://example.com --exports
--db
boolean
Clean database entries only (site, snapshots, pages). Preserves exported files on disk.
crawlith clean https://example.com --db
--snapshot
number
Clean a specific snapshot by ID. Removes only the specified snapshot, leaving other snapshots intact.
crawlith clean https://example.com --snapshot 5
--output
string
default:"./crawlith-reports"
Output directory where exports are stored. Used when cleaning exported files.
crawlith clean https://example.com --exports --output ./custom-reports
--yes
boolean
Skip confirmation prompt. Useful for automation and scripts.
crawlith clean https://example.com --yes

Behavior

Default (No Flags)

If neither --exports, --db, nor --snapshot is specified, the command cleans both database records and exported files:
crawlith clean https://example.com
This removes:
  • All database records for the site
  • All snapshots and their page data
  • All exported files in the output directory

Exports Only

crawlith clean https://example.com --exports
Removes:
  • All exported files (JSON, HTML, CSV, etc.) from ./crawlith-reports/example.com/
Preserves:
  • Database records
  • Ability to re-export or view in UI

Database Only

crawlith clean https://example.com --db
Removes:
  • Site record from database
  • All snapshots for the site
  • All page data
Preserves:
  • Exported files on disk

Specific Snapshot

crawlith clean https://example.com --snapshot 5
Removes:
  • Only snapshot #5 and its page data
Preserves:
  • Other snapshots
  • Site record
  • Exported files

Confirmation Prompt

By default, the command shows a confirmation prompt:
⚠️  Destructive Action ⚠️
 - Delete ALL database records for example.com
 - Delete ALL exported reports for example.com

Are you sure you want to continue? (y/N)
Type y or yes to confirm, or n/N to cancel. Skip the prompt with --yes:
crawlith clean https://example.com --yes

Examples

Complete Cleanup

Remove all data (database + exports):
crawlith clean https://example.com

Clean Exports Only

Free up disk space while keeping database records:
crawlith clean https://example.com --exports
You can always re-export later:
crawlith export https://example.com --export json,html

Clean Database Only

Remove from tracking while preserving exported files:
crawlith clean https://example.com --db

Remove Old Snapshot

Delete a specific outdated snapshot:
# List snapshots to find ID
crawlith sites --format json | jq '.[] | select(.domain=="example.com")'

# Remove snapshot #3
crawlith clean https://example.com --snapshot 3

Automated Cleanup Script

#!/bin/bash
# Clean sites that haven't been crawled in 90 days
for domain in $(crawlith sites --format json | jq -r '.[] | select(
  (.lastCrawl | fromdateiso8601) < (now - 7776000)
) | .domain'); do
  echo "Cleaning old data for $domain"
  crawlith clean "https://$domain" --yes
done

Clean Multiple Sites

#!/bin/bash
for site in old-site1.com old-site2.com old-site3.com; do
  crawlith clean "https://$site" --yes
done

Free Disk Space

# Remove all exports but keep database
for domain in $(crawlith sites --format json | jq -r '.[].domain'); do
  crawlith clean "https://$domain" --exports --yes
done

Custom Export Directory

crawlith clean https://example.com --exports --output /var/crawlith-reports

What Gets Deleted

Database Records

When cleaning the database:
  • sites table: Site entry removed
  • snapshots table: All snapshots for the site removed
  • pages table: All page data for the site removed
  • edges table: All link data for the site removed
  • issues table: All detected issues removed
  • clusters table: All content clusters removed

Export Files

When cleaning exports, the entire domain directory is removed:
crawlith-reports/
└── example.com/        ← This entire directory is deleted
    ├── graph.json
    ├── report.md
    ├── report.html
    ├── pages.csv
    └── ...

Error Handling

Site Not Found

❌ Site not found in database: example.com
This occurs when:
  • The site has never been crawled
  • The site was already cleaned
  • The domain name is misspelled

Snapshot Not Found

❌ Snapshot #5 not found
The specified snapshot ID doesn’t exist. List available snapshots:
crawlith sites --format json | jq '.[] | select(.domain=="example.com")'

Snapshot Belongs to Different Site

❌ Snapshot #5 does not belong to example.com
The snapshot ID exists but is associated with a different site.

No Database Records

ℹ️  No database records found for example.com
The site doesn’t exist in the database, but exports may still be cleaned if --exports was specified.

Use Cases

Development Testing

Clean test crawls:
crawlith crawl https://test.example.com --limit 100
# ... review results ...
crawlith clean https://test.example.com --yes

Disk Space Management

Remove old exports periodically:
crawlith clean https://example.com --exports --yes

Snapshot Rotation

Keep only recent snapshots:
# Manually identify old snapshot IDs and remove them
crawlith clean https://example.com --snapshot 1
crawlith clean https://example.com --snapshot 2

Client Offboarding

Remove all data for a client’s site:
crawlith clean https://client-site.com --yes

Reset and Re-crawl

Start fresh:
crawlith clean https://example.com --yes
crawlith crawl https://example.com

Bulk Cleanup

Remove multiple old sites:
cat old-sites.txt | xargs -I {} crawlith clean "https://{}" --yes

Safety Tips

Test First: Use without --yes to review what will be deleted before confirming.
Backup Exports: If you might need the data later, copy exports before cleaning:
cp -r crawlith-reports/example.com ~/backups/
crawlith clean https://example.com
Incremental Cleanup: Start with --exports only, then clean database if needed:
crawlith clean https://example.com --exports
# Later, if sure:
crawlith clean https://example.com --db
No Undo: Deleted database records cannot be recovered. You must re-crawl to restore data.
Concurrent Access: Don’t clean a site while it’s being crawled or viewed in the UI. Stop other Crawlith processes first.
  • sites - View all tracked sites before cleaning
  • crawl - Re-crawl a site after cleaning
  • ui - Review and export data before cleaning

Build docs developers (and LLMs) love