Skip to main content

Web Archiving Ecosystem

ArchiveBox is part of a larger ecosystem of web archiving tools, organizations, and communities working to preserve internet content.

Our Community Wiki

Our Community Wiki strives to be a comprehensive index of the web archiving industry:

Centralized Public Archives

Archive.org (Internet Archive) The largest public web archive, providing free access to billions of archived web pages through the Wayback Machine. ArchiveBox can automatically save URLs to Archive.org for redundancy (configurable with SAVE_ARCHIVEDOTORG).

Self-Hosted Archiving Alternatives

ArchiveWeb.page & ReplayWeb.page For better fidelity with complex interactive pages, heavy JS, streams, and API requests: Bookmark Management with Archiving If you want more bookmark categorization and note-taking features:
  • Memex - Full-text search for browsing history and bookmarks
  • Hoarder - Self-hosted bookmark manager
  • LinkWarden - Collaborative bookmark manager
  • Archivy - Knowledge base with archiving
  • LinkAce - Bookmark archive manager
Advanced Crawling Tools For more advanced recursive spider/crawling ability beyond --depth=1:
  • Browsertrix - High-fidelity browser-based crawler
  • Photon - Fast web crawler
  • Scrapy - Python web scraping framework
You can pipe URLs from these tools into ArchiveBox for archiving.

ArchiveBox Integrations

Browser Extensions Proxy Archiving Desktop Apps

What Makes ArchiveBox Unique

ArchiveBox gained momentum in the internet archiving industry because it uniquely combines three things:
  1. Distributed: Users own their data instead of entrusting it to one big central provider
  2. Future-proof: Saving in multiple formats and extracting raw TXT, PNG, PDF, MP4, etc. files
  3. Extensible: Powerful APIs, flexible storage, and a big community adding new extractors regularly

vs. Centralized Public Archives

Not all content is suitable for centralized, publicly accessible platforms. Archive.org doesn’t archive content behind login walls. ArchiveBox fills this gap by:
  • Enabling individual archiving of private/authenticated content
  • Supporting decentralized archiving less susceptible to censorship or disasters
  • Allowing users to archive much larger portions of the internet than centralized services can handle

vs. Other Self-Hosted Tools

ArchiveBox differentiates itself through:
  • Comprehensive CLI interface for power users
  • Web UI that works independently or with the CLI
  • Simple on-disk data format usable without running ArchiveBox
  • Multiple output formats for long-term durability
  • Active development and community support

Industry Organizations

International Internet Preservation Consortium (IIPC) A global community of organizations dedicated to preserving internet content for future generations. Archive Team A loose collective of rogue archivists, programmers, writers, and loudmouths dedicated to saving digital history.

Learning Resources

Essential Reading

Academic Resources

  • Research papers on web archiving techniques
  • Digital preservation best practices
  • Legal and ethical considerations for archiving

Community Engagement

Social Media

Discussions

Contributing

Join our community of contributors! See our Contributing Guide for details on:
  • Setting up your development environment
  • Finding issues to work on
  • Code style guidelines
  • Submitting pull requests

Professional Support

Need help with a custom archiving solution? See our Professional Support page for commercial options.

Stay Updated

Alternative Frontends

When archiving sites that block bots, you can rewrite URLs to use alternative frontends: This helps work around bot blocking while preserving content ethically.

Join the Movement

Web archiving is a collective effort to preserve digital history. Whether you’re:
  • A researcher preserving academic resources
  • A journalist saving cited sources
  • An individual backing up personal bookmarks
  • An organization maintaining institutional knowledge
…you’re part of the solution to link rot and digital deterioration. Start archiving today and help preserve the internet for future generations!

Build docs developers (and LLMs) love