TMO-H

The TMO-H (TMOHentai) handler uses AI-powered extraction to intelligently detect and download manga images from complex JavaScript-heavy pages.

Supported URLs

The handler recognizes TMOHentai chapter URLs:

https://tmohentai.com/contents/[manga-name]/[chapter-id]
https://tmohentai.com/reader/[manga-name]/[chapter-id]
https://tmohentai.com/reader/[manga-name]/[chapter-id]/paginated/1

The handler automatically converts URLs to cascade view for optimal extraction.

Extraction technology

Crawl4AI with Gemini AI

TMO-H uses Crawl4AI with Google Gemini 1.5 Flash for intelligent image detection:

llm_config = LLMConfig(
    provider="gemini/gemini-1.5-flash", 
    api_token=config.GOOGLE_API_KEY
)

instruction = """Extract all image URLs. Look for 'data-original' 
and 'src'. Prioritize 'data-original'. Return JSON {'images': ['url1'...]}."""

llm_strategy = LLMExtractionStrategy(
    llm_config=llm_config, 
    instruction=instruction
)

Location: ~/workspace/source/core/sites/tmo.py:51-53 The AI model analyzes the page structure and intelligently extracts image URLs even when they’re obfuscated or dynamically loaded.

URL transformation

The handler converts different URL formats to cascade view:

target_url = url
if "/contents/" in url:
    target_url = url.replace("/contents/", "/reader/") + "/cascade"
elif "/paginated/" in url:
    target_url = re.sub(r'/paginated/\d+', '/cascade', url)

Location: ~/workspace/source/core/sites/tmo.py:40-44 Cascade view loads all chapter images on a single page, making extraction more reliable.

Lazy loading script

TMO-H pages use lazy loading with data-original attributes. The handler executes JavaScript to trigger image loading:

(async () => {
    const sleep = (ms) => new Promise(r => setTimeout(r, ms));
    let totalHeight = 0; 
    let distance = 500;
    while(totalHeight < document.body.scrollHeight) { 
        window.scrollBy(0, distance); 
        totalHeight += distance; 
        await sleep(100); 
    }
    window.scrollTo(0, 0);
    document.querySelectorAll('img[data-original]').forEach(img => { 
        img.src = img.getAttribute('data-original'); 
    });
    await sleep(1000);
})();

Location: ~/workspace/source/core/sites/tmo.py:56-65 This script:

Scrolls down in 500px increments to trigger lazy loading
Waits 100ms between scrolls
Scrolls back to top
Manually triggers data-original to src conversion
Waits 1 second for rendering

AI extraction process

The extraction happens in two phases:

Phase 1: AI parsing

result = await crawler.arun(
    target_url,
    extraction_strategy=llm_strategy,
    bypass_cache=True,
    js_code=js_lazy_load,
    wait_for="css:img.content-image"
)

if result.success:
    if result.extracted_content:
        clean = result.extracted_content
        # Remove markdown code blocks
        if "```json" in clean: 
            clean = clean.split("```json")[1].split("```")[0].strip()
        elif "```" in clean: 
            clean = clean.split("```")[1].split("```")[0].strip()
        image_urls = json.loads(clean).get("images", [])

Location: ~/workspace/source/core/sites/tmo.py:67-86

Phase 2: Regex fallback

If AI extraction fails or returns no results, regex fallback kicks in:

if not image_urls and result.html:
    matches = re.findall(
        r'data-original=["\']https://[^"\']+\.(?:webp|jpg|png)["\']', 
        result.html
    )
    if matches: 
        image_urls = sorted(list(set(matches)))

Location: ~/workspace/source/core/sites/tmo.py:91-94

Image filtering

The handler filters out placeholder images:

image_urls = [u for u in image_urls if "blank.gif" not in u]

Location: ~/workspace/source/core/sites/tmo.py:96 blank.gif is commonly used as a placeholder before lazy loading occurs.

Title extraction

The handler attempts to extract the chapter title from the page:

if result.html:
    match = re.search(
        r'<h1[^>]*class=["\'].*?reader-title.*?["\'][^>]*>(.*?)</h1>', 
        result.html, 
        re.IGNORECASE | re.DOTALL
    )
    if match:
        safe = clean_filename(match.group(1).strip()).replace("\n", " ")
        if safe: 
            pdf_name = f"{safe}.pdf"

Location: ~/workspace/source/core/sites/tmo.py:102-106 If no title is found, it defaults to "manga_tmo.pdf".

Headers and configuration

TMO-H requires specific headers for image downloads:

HEADERS_TMO = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Referer": "https://tmohentai.com/"
}

These headers are passed to download_and_make_pdf.

Usage examples

Single chapter download

from core.handler import process_url

await process_url(
    "https://tmohentai.com/contents/my-manga/chapter-1",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"{current}/{total}")
)

Output: PDF/My Manga - Chapter 1.pdf

Via web interface

Start the web server: START_WEB_VERSION.bat
Open http://localhost:3000
Paste the TMOHentai URL
Monitor real-time extraction progress

Via Discord bot

!descargar https://tmohentai.com/contents/my-manga/chapter-1

The bot will download and upload the PDF (or GoFile link if >8MB).

Implementation details

Class structure

class TMOHandler(BaseSiteHandler):
    @staticmethod
    def get_supported_domains() -> list:
        return ["tmohentai"]
    
    async def process(
        self,
        url: str,
        log_callback: Callable[[str], None],
        check_cancel: Callable[[], bool],
        progress_callback: Optional[Callable[[int, int], None]] = None
    ) -> None:
        """Process TMOHentai URL using Gemini AI for extraction."""
        ...

Location: ~/workspace/source/core/sites/tmo.py:18-32

Domain matching

The handler matches any domain containing "tmohentai", allowing for different TLDs:

tmohentai.com
tmohentai.org
tmohentai.net (if mirrors exist)

Wait conditions

The crawler waits for images to load before extraction:

await crawler.arun(
    target_url,
    extraction_strategy=llm_strategy,
    bypass_cache=True,
    js_code=js_lazy_load,
    wait_for="css:img.content-image"  # Wait for content images
)

Location: ~/workspace/source/core/sites/tmo.py:67-74

Known limitations

Requires Google API key: AI extraction requires GOOGLE_API_KEY in your .env file. Without it, only regex fallback will work.

Single chapter only

Unlike ZonaTMO, TMO-H does not support automatic series detection. You must provide individual chapter URLs.

Error handling

try:
    if result.extracted_content:
        clean = result.extracted_content
        if "```json" in clean: 
            clean = clean.split("```json")[1].split("```")[0].strip()
        image_urls = json.loads(clean).get("images", [])
except Exception as e:
    log_callback(f"[WARN] Error parsing AI response: {e}")

Location: ~/workspace/source/core/sites/tmo.py:79-88 If JSON parsing fails, the handler gracefully falls back to regex.

Performance characteristics

Speed: Fast (AI extraction is quick with Gemini Flash)
Reliability: Very high (AI + regex fallback)
Resource usage: Medium (LLM API calls)

Comparison with ZonaTMO

Feature	TMO-H	ZonaTMO
Extraction method	AI + Regex	AI + Regex
Series support	✗	✓
Cascade view	✓	✓
Lazy loading	✓	✓
Scroll distance	500px	1000px
Scroll delay	100ms	200ms

Troubleshooting

No images found

If extraction fails:

Verify GOOGLE_API_KEY is set correctly
Check if the URL format is correct
Try visiting the URL manually to confirm it loads images
Check logs for AI extraction errors

Incomplete downloads

If some images are missing:

The lazy loading script may need adjustment
Try increasing scroll delays in the JS code
Some images may be blocked by site protection

Next steps

ZonaTMO

Compare with ZonaTMO’s implementation

Configuration

Configure Google API key

M440

See a simpler crawler approach

Utils

Explore the PDF generation process

Get Started

Deployment

Supported Sites

Core Concepts

Supported URLs

Extraction technology

Crawl4AI with Gemini AI

URL transformation

Lazy loading script

AI extraction process

Phase 1: AI parsing

Phase 2: Regex fallback

Image filtering

Title extraction

Headers and configuration

Usage examples

Single chapter download

Via web interface

Via Discord bot

Implementation details

Class structure

Domain matching

Wait conditions

Known limitations

Single chapter only

Error handling

Performance characteristics

Comparison with ZonaTMO

Troubleshooting

No images found

Incomplete downloads

Next steps

ZonaTMO

Configuration

M440

Utils

Build docs developers (and LLMs) love

Get Started

Deployment

Supported Sites

Core Concepts

​Supported URLs

​Extraction technology

​Crawl4AI with Gemini AI

​URL transformation

​Lazy loading script

​AI extraction process

​Phase 1: AI parsing

​Phase 2: Regex fallback

​Image filtering

​Title extraction

​Headers and configuration

​Usage examples

​Single chapter download

​Via web interface

​Via Discord bot

​Implementation details

​Class structure

​Domain matching

​Wait conditions

​Known limitations

​Single chapter only

​Error handling

​Performance characteristics

​Comparison with ZonaTMO

​Troubleshooting

​No images found

​Incomplete downloads

​Next steps

ZonaTMO

Configuration

M440

Utils

Build docs developers (and LLMs) love

Supported URLs

Extraction technology

Crawl4AI with Gemini AI

URL transformation

Lazy loading script

AI extraction process

Phase 1: AI parsing

Phase 2: Regex fallback

Image filtering

Title extraction

Headers and configuration

Usage examples

Single chapter download

Via web interface

Via Discord bot

Implementation details

Class structure

Domain matching

Wait conditions

Known limitations

Single chapter only

Error handling

Performance characteristics

Comparison with ZonaTMO

Troubleshooting

No images found

Incomplete downloads

Next steps