Supported URLs
The handler recognizes TMOHentai chapter URLs:The handler automatically converts URLs to cascade view for optimal extraction.
Extraction technology
Crawl4AI with Gemini AI
TMO-H uses Crawl4AI with Google Gemini 1.5 Flash for intelligent image detection:~/workspace/source/core/sites/tmo.py:51-53
The AI model analyzes the page structure and intelligently extracts image URLs even when they’re obfuscated or dynamically loaded.
URL transformation
The handler converts different URL formats to cascade view:~/workspace/source/core/sites/tmo.py:40-44
Cascade view loads all chapter images on a single page, making extraction more reliable.
Lazy loading script
TMO-H pages use lazy loading withdata-original attributes. The handler executes JavaScript to trigger image loading:
~/workspace/source/core/sites/tmo.py:56-65
This script:
- Scrolls down in 500px increments to trigger lazy loading
- Waits 100ms between scrolls
- Scrolls back to top
- Manually triggers
data-originaltosrcconversion - Waits 1 second for rendering
AI extraction process
The extraction happens in two phases:Phase 1: AI parsing
~/workspace/source/core/sites/tmo.py:67-86
Phase 2: Regex fallback
If AI extraction fails or returns no results, regex fallback kicks in:~/workspace/source/core/sites/tmo.py:91-94
Image filtering
The handler filters out placeholder images:~/workspace/source/core/sites/tmo.py:96
blank.gif is commonly used as a placeholder before lazy loading occurs.
Title extraction
The handler attempts to extract the chapter title from the page:~/workspace/source/core/sites/tmo.py:102-106
If no title is found, it defaults to "manga_tmo.pdf".
Headers and configuration
TMO-H requires specific headers for image downloads:download_and_make_pdf.
Usage examples
Single chapter download
PDF/My Manga - Chapter 1.pdf
Via web interface
- Start the web server:
START_WEB_VERSION.bat - Open http://localhost:3000
- Paste the TMOHentai URL
- Monitor real-time extraction progress
Via Discord bot
Implementation details
Class structure
~/workspace/source/core/sites/tmo.py:18-32
Domain matching
The handler matches any domain containing"tmohentai", allowing for different TLDs:
- tmohentai.com
- tmohentai.org
- tmohentai.net (if mirrors exist)
Wait conditions
The crawler waits for images to load before extraction:~/workspace/source/core/sites/tmo.py:67-74
Known limitations
Single chapter only
Unlike ZonaTMO, TMO-H does not support automatic series detection. You must provide individual chapter URLs.Error handling
~/workspace/source/core/sites/tmo.py:79-88
If JSON parsing fails, the handler gracefully falls back to regex.
Performance characteristics
- Speed: Fast (AI extraction is quick with Gemini Flash)
- Reliability: Very high (AI + regex fallback)
- Resource usage: Medium (LLM API calls)
Comparison with ZonaTMO
| Feature | TMO-H | ZonaTMO |
|---|---|---|
| Extraction method | AI + Regex | AI + Regex |
| Series support | ✗ | ✓ |
| Cascade view | ✓ | ✓ |
| Lazy loading | ✓ | ✓ |
| Scroll distance | 500px | 1000px |
| Scroll delay | 100ms | 200ms |
Troubleshooting
No images found
If extraction fails:- Verify
GOOGLE_API_KEYis set correctly - Check if the URL format is correct
- Try visiting the URL manually to confirm it loads images
- Check logs for AI extraction errors
Incomplete downloads
If some images are missing:- The lazy loading script may need adjustment
- Try increasing scroll delays in the JS code
- Some images may be blocked by site protection
Next steps
ZonaTMO
Compare with ZonaTMO’s implementation
Configuration
Configure Google API key
M440
See a simpler crawler approach
Utils
Explore the PDF generation process