ZonaTMO

The ZonaTMO handler is one of the most sophisticated extractors in the project, supporting both single chapters and full series downloads using AI-powered image detection.

Supported URLs

The handler recognizes two URL patterns:

Single chapter

https://zonatmo.com/view_uploads/[chapter-id]
https://zonatmo.com/viewer/[chapter-id]
https://zonatmo.com/viewer/[chapter-id]/paginated

Full series (cover page)

https://zonatmo.com/library/manga/[series-name]

The handler automatically detects which type of URL you provide and adjusts its extraction strategy accordingly.

Extraction technology

Crawl4AI with Gemini LLM

ZonaTMO uses Crawl4AI with Google Gemini 1.5 Flash for intelligent image extraction:

llm_config = LLMConfig(
    provider="gemini/gemini-1.5-flash", 
    api_token=config.GOOGLE_API_KEY
)

instruction = """Extract all image URLs. Look for 'data-original' 
and 'src'. Return JSON {'images': ['url1'...]}."""

llm_strategy = LLMExtractionStrategy(
    llm_config=llm_config, 
    instruction=instruction
)

Location: ~/workspace/source/core/sites/zonatmo.py:153-155

Cascade view optimization

The handler converts paginated URLs to cascade view for more efficient extraction:

if "/paginated" in final_url:
    target_url = final_url.replace("/paginated", "/cascade")
elif "/viewer/" in final_url:
    if not final_url.endswith("/cascade"):
        target_url = final_url + "/cascade"

Location: ~/workspace/source/core/sites/zonatmo.py:139-142 Cascade view loads all images on a single scrollable page, making extraction faster and more reliable.

Lazy loading handling

ZonaTMO pages use lazy loading for images. The handler executes JavaScript to trigger loading:

(async () => {
    const sleep = (ms) => new Promise(r => setTimeout(r, ms));
    window.scrollTo(0, 0);
    let totalHeight = 0; 
    let distance = 1000;
    while(totalHeight < document.body.scrollHeight) { 
        window.scrollBy(0, distance); 
        totalHeight += distance; 
        await sleep(200); 
    }
    await sleep(1000);
})();

Location: ~/workspace/source/core/sites/zonatmo.py:157-165 This script:

Scrolls to the top of the page
Gradually scrolls down in 1000px increments
Waits 200ms between scrolls to trigger lazy loading
Waits 1 second after reaching the bottom

Full series extraction

When you provide a cover/library URL, the handler:

Crawls the series page to find all chapter links
Extracts the manga title from <h1> tags
Creates a folder named after the series
Downloads each chapter as a separate PDF
Reverses chapter order (oldest to newest)

links = re.findall(
    r'href=["\']https://zonatmo.com/view_uploads/[^"\']+)["\']', 
    result.html
)

# Remove duplicates while preserving order
clean_links = []
seen = set()
for l in links:
    if l not in seen:
        clean_links.append(l)
        seen.add(l)

clean_links.reverse()  # Oldest first

Location: ~/workspace/source/core/sites/zonatmo.py:49-56,90

Title extraction

The handler tries multiple methods to extract the manga title:

H1 element with class element-title (preferred)
Page title tag as fallback
Default: "Manga_ZonaTMO" if nothing found

h1_match = re.search(
    r'<h1[^>]*class=["\'].*?element-title.*?["\'][^>]*>(.*?)</h1>', 
    result.html, 
    re.IGNORECASE | re.DOTALL
)

if h1_match: 
    raw_html = h1_match.group(1)
    # Remove <small> tags
    raw_html = re.sub(r'<small[^>]*>.*?</small>', '', raw_html, flags=re.IGNORECASE | re.DOTALL)
    manga_title = clean_filename(raw_html)

Location: ~/workspace/source/core/sites/zonatmo.py:66-74

Fallback extraction

If LLM extraction fails, the handler uses regex fallback:

if not image_urls and result.html:
    matches = re.findall(
        r'(https?://(?:img1?\.?tmo\.com|otakuteca\.com|img1tmo\.com)[^"\'\s]+\.(?:webp|jpg|png))', 
        result.html
    )
    if matches: 
        image_urls = sorted(list(set(matches)))

Location: ~/workspace/source/core/sites/zonatmo.py:185-187

Image filtering

The handler filters out non-content images:

image_urls = [
    u for u in image_urls 
    if "cover" not in u 
    and "avatar" not in u 
    and "banner" not in u
]

Location: ~/workspace/source/core/sites/zonatmo.py:189

Headers and configuration

ZonaTMO requires specific headers defined in config.py:

HEADERS_ZONATMO = {
    "User-Agent": "Mozilla/5.0 ...",
    "Referer": "https://zonatmo.com/"
}

These headers are passed to download_and_make_pdf for image downloads.

Usage examples

Single chapter

from core.handler import process_url

await process_url(
    "https://zonatmo.com/view_uploads/123456",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"{current}/{total}")
)

Output: PDF/zonatmo_chapter.pdf (or auto-detected title)

Full series

await process_url(
    "https://zonatmo.com/library/manga/my-favorite-manga",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"Chapter {current}/{total}")
)

Output:

PDF/
└── My Favorite Manga/
    ├── My Favorite Manga - 001.pdf
    ├── My Favorite Manga - 002.pdf
    └── My Favorite Manga - 003.pdf

Web interface

You can also use the web version:

Start the server: START_WEB_VERSION.bat
Navigate to http://localhost:3000
Paste any ZonaTMO URL
Watch real-time logs and progress bars

Rate limiting

The handler includes a 1-second delay between chapters to avoid rate limiting:

for i, chap_url in enumerate(clean_links):
    await self._process_chapter(chap_url, ...)
    await asyncio.sleep(1)  # Polite delay

Location: ~/workspace/source/core/sites/zonatmo.py:105-106

Known limitations

Requires Google API key: The LLM extraction will fail without GOOGLE_API_KEY in your .env file. Regex fallback may not catch all images.

URL resolution

The handler attempts to resolve redirects before extraction:

async with aiohttp.ClientSession() as session:
    async with session.get(url, headers=config.HEADERS_ZONATMO) as resp:
        if resp.status == 200:
            final_url = str(resp.url)

Location: ~/workspace/source/core/sites/zonatmo.py:134-137 If this fails, it uses the original URL with a warning.

Duplicate detection

Chapter lists may contain duplicate URLs. The handler deduplicates using a set:

clean_links = []
seen = set()
for l in links:
    if l not in seen:
        clean_links.append(l)
        seen.add(l)

Location: ~/workspace/source/core/sites/zonatmo.py:51-56

Implementation details

Class structure

class ZonaTMOHandler(BaseSiteHandler):
    @staticmethod
    def get_supported_domains() -> list:
        return ["zonatmo.com"]
    
    async def process(self, url, log_callback, check_cancel, progress_callback):
        # Main processing logic
        ...
    
    async def _process_chapter(self, url, output_name, ...):
        # Chapter-specific extraction
        ...

Location: ~/workspace/source/core/sites/zonatmo.py:20-214

Domain matching

The handler is automatically selected by the routing logic in handler.py when the URL contains "zonatmo.com".

Next steps

TMO-H

Similar AI-powered extraction for TMO-H

Configuration

Set up your Google API key

Architecture

Learn about the Strategy Pattern

Utils

Explore download_and_make_pdf function

Get Started

Deployment

Supported Sites

Core Concepts

Supported URLs

Single chapter

Full series (cover page)

Extraction technology

Crawl4AI with Gemini LLM

Cascade view optimization

Lazy loading handling

Full series extraction

Title extraction

Fallback extraction

Image filtering

Headers and configuration

Usage examples

Single chapter

Full series

Web interface

Rate limiting

Known limitations

URL resolution

Duplicate detection

Implementation details

Class structure

Domain matching

Next steps

TMO-H

Configuration

Architecture

Utils

Build docs developers (and LLMs) love

Get Started

Deployment

Supported Sites

Core Concepts

​Supported URLs

​Single chapter

​Full series (cover page)

​Extraction technology

​Crawl4AI with Gemini LLM

​Cascade view optimization

​Lazy loading handling

​Full series extraction

​Title extraction

​Fallback extraction

​Image filtering

​Headers and configuration

​Usage examples

​Single chapter

​Full series

​Web interface

​Rate limiting

​Known limitations

​URL resolution

​Duplicate detection

​Implementation details

​Class structure

​Domain matching

​Next steps

TMO-H

Configuration

Architecture

Utils

Build docs developers (and LLMs) love

Supported URLs

Single chapter

Full series (cover page)

Extraction technology

Crawl4AI with Gemini LLM

Cascade view optimization

Lazy loading handling

Full series extraction

Title extraction

Fallback extraction

Image filtering

Headers and configuration

Usage examples

Single chapter

Full series

Web interface

Rate limiting

Known limitations

URL resolution

Duplicate detection

Implementation details

Class structure

Domain matching

Next steps