Supported domains
The handler supports Hentai2Read domains:~/workspace/source/core/sites/h2r.py:17-19
This matches any domain containing "hentai2read" (e.g., hentai2read.com, hentai2read.org).
Supported URLs
Extraction technology
JSON parsing
Hentai2Read embeds chapter metadata directly in the page HTML as a JavaScript variable:~/workspace/source/core/sites/h2r.py:40-49
This approach:
- Requires no JavaScript execution
- No browser automation needed
- No AI processing required
- Extremely fast and reliable
CDN URL construction
Images are hosted on CDN servers. The handler constructs full URLs:~/workspace/source/core/sites/h2r.py:51-56
This ensures images are downloaded from the correct CDN server.
Title extraction
The title is also embedded in thegData variable:
~/workspace/source/core/sites/h2r.py:59-63
The clean_filename utility removes invalid characters for file systems.
Implementation details
Class structure
~/workspace/source/core/sites/h2r.py:14-28
Crawl4AI usage
Despite being JSON-based, H2R still uses Crawl4AI for initial page fetching:~/workspace/source/core/sites/h2r.py:31-37
This could be optimized to use aiohttp directly, but Crawl4AI provides:
- Consistent interface with other handlers
- Built-in caching
- Error handling
Error handling
The handler includes comprehensive error handling:~/workspace/source/core/sites/h2r.py:43-79
Headers and configuration
H2R requires minimal headers:download_and_make_pdf for image downloads.
Usage examples
Single chapter download
PDF/My Doujin - Chapter 1.pdf
Via web interface
- Start:
START_WEB_VERSION.bat - Navigate to: http://localhost:3000
- Paste URL:
https://hentai2read.com/my_doujin/1 - Download completes in seconds
Via Discord bot
Performance characteristics
Speed comparison
| Handler | Time for 50 images | Reason |
|---|---|---|
| H2R | ~5 seconds | Pure JSON parsing |
| M440 | ~8 seconds | Regex extraction |
| TMO-H | ~15 seconds | AI processing |
| Hitomi | ~60 seconds | Page-by-page browser |
Times are estimates and depend on network speed and image sizes.
Resource usage
- CPU: Minimal (no AI or JS execution)
- Memory: Low (~50MB per chapter)
- Network: Only downloads actual images, no extra requests
- API calls: None required
Known limitations
Single chapter only
H2R handler does not support automatic series detection or bulk downloads. You must provide individual chapter URLs.No lazy loading
Since extraction happens server-side (JSON parsing), there’s no need for lazy loading scripts or browser automation.Escape sequences
The handler handles escaped forward slashes in JSON:~/workspace/source/core/sites/h2r.py:49
This ensures URLs are properly formatted.
Troubleshooting
”Chapter data not found”
This error means thegData variable wasn’t found in the HTML. Possible causes:
- Site structure changed
- URL is invalid (404 page)
- Page requires authentication
- JavaScript variable name changed
”Could not extract image list”
ThegData variable was found, but the images array is missing:
~/workspace/source/core/sites/h2r.py:45-75
Solution: Check if the JSON structure has changed.
CDN errors (404 on images)
If the base CDN URL is wrong, image downloads will fail:~/workspace/source/core/sites/h2r.py:52-54
Solution: Update the fallback URL in the code:
Advantages over other methods
vs. Browser automation (Hitomi, NHentai)
- 10x faster
- No Playwright dependencies
- No headless browser overhead
- More reliable (no Cloudflare issues)
vs. AI extraction (ZonaTMO, TMO-H)
- No API key required
- No LLM costs
- No parsing errors
- Deterministic results
vs. Regex-only (M440)
- Cleaner extraction (structured JSON)
- More maintainable
- Less prone to breaking on HTML changes
Code walkthrough
Here’s the complete extraction flow:- Fetch page HTML
- Extract gData variable
- Parse images array
- Construct URLs
- Generate PDF
~/workspace/source/core/sites/h2r.py:31-73
Next steps
M440
Compare with regex-based extraction
Hitomi.la
See why some sites need browser automation
Utils
Explore download_and_make_pdf
Configuration
Configure headers and output paths