Site handler for zonatmo.com using Crawl4AI with cascade view and LLM extraction
The ZonaTMO handler is one of the most sophisticated extractors in the project, supporting both single chapters and full series downloads using AI-powered image detection.
The handler converts paginated URLs to cascade view for more efficient extraction:
if "/paginated" in final_url: target_url = final_url.replace("/paginated", "/cascade")elif "/viewer/" in final_url: if not final_url.endswith("/cascade"): target_url = final_url + "/cascade"
Location: ~/workspace/source/core/sites/zonatmo.py:139-142Cascade view loads all images on a single scrollable page, making extraction faster and more reliable.
When you provide a cover/library URL, the handler:
Crawls the series page to find all chapter links
Extracts the manga title from <h1> tags
Creates a folder named after the series
Downloads each chapter as a separate PDF
Reverses chapter order (oldest to newest)
links = re.findall( r'href=["\']https://zonatmo.com/view_uploads/[^"\']+)["\']', result.html)# Remove duplicates while preserving orderclean_links = []seen = set()for l in links: if l not in seen: clean_links.append(l) seen.add(l)clean_links.reverse() # Oldest first
If LLM extraction fails, the handler uses regex fallback:
if not image_urls and result.html: matches = re.findall( r'(https?://(?:img1?\.?tmo\.com|otakuteca\.com|img1tmo\.com)[^"\'\s]+\.(?:webp|jpg|png))', result.html ) if matches: image_urls = sorted(list(set(matches)))
The handler attempts to resolve redirects before extraction:
async with aiohttp.ClientSession() as session: async with session.get(url, headers=config.HEADERS_ZONATMO) as resp: if resp.status == 200: final_url = str(resp.url)
Location: ~/workspace/source/core/sites/zonatmo.py:134-137If this fails, it uses the original URL with a warning.