Plugin Development Guide
ArchiveBox’s plugin system allows you to extend its functionality by creating custom plugins. Plugins are self-contained modules that hook into the archiving lifecycle.Plugin Structure
A minimal plugin consists of:Directory Naming
Plugin directory names should be:- Lowercase
- Use underscores for spaces
- Descriptive of the plugin’s purpose
screenshot, parse_html_urls, search_backend_sqlite
Hook File Naming
Hook files follow the pattern:- Lifecycle: When the hook runs (
Binary,Crawl,Snapshot) - Priority: Execution order (00-99, lower runs first)
- Name: Descriptive name (matches plugin directory)
- Extension:
.py,.js, or.sh
on_Binary__10_npm_install.py- Install npm dependencieson_Crawl__00_chrome_launch.js- Start Chrome at crawl beginningon_Snapshot__51_screenshot.js- Take screenshot of snapshot
Configuration Schema
Every plugin must have aconfig.json with JSON Schema validation:
Configuration Features
Type Validation
Supported types:boolean, integer, number, string, array, object
Aliases
Provide alternate names for backward compatibility:Fallbacks
Inherit values from other config options:Plugin Dependencies
Declare required plugins:Hook Implementation
Python Hooks
JavaScript Hooks
Chrome-Based Plugins
Plugins that use Chrome must follow these rules:Dependency Rules
Using chrome_utils.js
All Chrome operations must use the shared utilities:Connecting to Chrome
Plugins should connect to an existing session, not launch their own:Plugin Lifecycle
Binary Hooks (on_Binary__*)
Run once to install dependencies:
Crawl Hooks (on_Crawl__*)
Run once per crawl to set up resources:
Snapshot Hooks (on_Snapshot__*)
Run for each snapshot to extract content:
Hook Execution Order
Hooks run in priority order (00-99):Plugin Testing
Tests must be completely isolated from ArchiveBox:Testing Chrome Plugins
Chrome plugins must test both execution paths:- Connect to existing session (~50% of code)
- Launch own browser (~30% of code)
- Shared logic (~20% of code)
archivebox/plugins/chrome/tests/chrome_test_utils.py for Chrome setup.
Plugin Templates
Plugins can provide UI templates:Icon Template
Card Template
Full Template
Best Practices
Configuration
- Always check enabled flag at the start of your hook
- Use fallbacks for common settings (
TIMEOUT,USER_AGENT) - Provide sensible defaults in config.json
- Validate configuration with JSON Schema constraints
Error Handling
- Exit codes:
0= Success or skipped (plugin disabled)1= Error
- Print errors to stderr:
console.error()orsys.stderr - Print results to stdout: JSONL output
- Handle timeouts gracefully
Performance
- Reuse resources: Don’t launch new Chrome sessions
- Run in parallel: Use same priority number
- Minimize dependencies: Keep plugins lightweight
- Cache expensive operations
Output
- Create plugin subdirectory:
my_plugin/output.txt - Use descriptive filenames: Not
output.txt, butmetadata.json - Emit JSONL for indexing:
{"url": ..., "status": ...} - Handle existing output: Overwrite or skip as appropriate
Dependencies
- Declare in config.json:
required_pluginsarray - Check binary exists: Before running commands
- Provide installation hooks:
on_Binary__*oron_Crawl__* - Document requirements: In docstring
Plugin Discovery
ArchiveBox automatically discovers plugins in:- Built-in plugins:
archivebox/plugins/ - User plugins:
~/.archivebox/plugins/(if supported) - Data dir plugins:
DATA_DIR/plugins/(if supported)
Example: Simple Extractor
Let’s create a plugin that extracts all image URLs:Publishing Plugins
To share your plugin:- Create a Git repository with your plugin code
- Document usage in README.md
- Include examples of output
- Publish on GitHub or other hosting
- Share with community on ArchiveBox forums/Discord
Related Resources
Plugin Overview
Learn about plugin architecture and types
Chrome Plugins
Deep dive into Chrome-based plugin development