M440

The M440 handler provides a lightweight extraction method for M440.in and Mangas.in using simple HTML parsing without AI assistance.

Supported domains

The handler supports multiple domains:

@staticmethod
def get_supported_domains() -> list:
    return ["m440.in", "mangas.in"]

Location: ~/workspace/source/core/sites/m440.py:18-20

URL patterns

Single chapter:

https://m440.in/manga/[series-name]/[chapter-id]
https://mangas.in/manga/[series-name]/[chapter-id]

Series cover:

https://m440.in/manga/[series-name]
https://mangas.in/manga/[series-name]

Extraction technology

Simple Crawl4AI

M440 uses Crawl4AI without AI assistance, relying on straightforward regex extraction:

async with AsyncWebCrawler(verbose=True) as crawler:
    result = await crawler.arun(url=url, bypass_cache=True)
    if not result.success:
        log_callback(f"[ERROR] Page load failed: {result.error_message}")
        return
    
    html = result.html
    matches = re.findall(r'data-src=["\']https://[^"\']+)["\']', html)

Location: ~/workspace/source/core/sites/m440.py:32-36,131 This approach is:

Faster than AI extraction
More reliable for simple HTML structures
Requires no API keys
Uses less CPU and memory

Cover detection

The handler intelligently detects whether a URL is a cover page or single chapter:

clean_url = url.split("?")[0].rstrip("/")
is_cover_page = bool(re.search(r'/manga/[^/]+$', clean_url))

if not is_cover_page:
    # Check if there are many chapter links
    potential_chapters = re.findall(
        r'href=["\']https://m440.in/manga/[^/]+/[^"\']+)["\']', 
        html
    )
    if len(set(potential_chapters)) > 3: 
        is_cover_page = True

Location: ~/workspace/source/core/sites/m440.py:39-44 A page is considered a cover if:

URL ends with /manga/[series-name] (no chapter path)
OR it contains more than 3 unique chapter links

Series download

When a cover page is detected, the handler downloads all chapters:

Title extraction

manga_title = "Manga_M440"
title_match = re.search(
    r'<h2[^>]*class=["\']widget-title["\'][^>]*>(.*?)</h2>', 
    html
)
if title_match: 
    manga_title = clean_filename(title_match.group(1).strip())

Location: ~/workspace/source/core/sites/m440.py:48-51

Chapter extraction

links = re.findall(
    r'href=["\']https://m440.in/manga/[^/]+/[^"\']+)["\']', 
    html
)

seen = set()
clean_links = []
for l in links:
    if l not in seen and "/manga/" in l and l != url:
        seen.add(l)
        clean_links.append(l)

clean_links.reverse()  # Oldest to newest

Location: ~/workspace/source/core/sites/m440.py:53-60 The handler:

Extracts all chapter URLs
Removes duplicates while preserving order
Filters out the current URL
Reverses to start with chapter 1

Batch processing

pdf_dir = os.path.join(os.getcwd(), config.PDF_FOLDER_NAME, manga_title)
os.makedirs(pdf_dir, exist_ok=True)

for i, chap_url in enumerate(clean_links):
    if check_cancel and check_cancel(): 
        break
    if progress_callback: 
        progress_callback(i + 1, len(clean_links))
    
    log_callback(f"Processing Cap {i+1}/{len(clean_links)}")
    
    pdf_name = f"{manga_title} - {chap_url.split('/')[-1]}.pdf"
    full_pdf_path = os.path.join(pdf_dir, pdf_name)
    
    if os.path.exists(full_pdf_path): 
        continue
    
    await self._process_chapter(
        chap_url, 
        full_pdf_path, 
        crawler, 
        log_callback, 
        check_cancel, 
        None
    )

Location: ~/workspace/source/core/sites/m440.py:68-81 Features:

Creates series-specific folder
Skips already-downloaded chapters
Respects cancellation requests
Reports progress per chapter

Image extraction

M440 uses data-src attributes for lazy-loaded images:

async def _process_chapter(
    self, 
    url: str, 
    output_pdf_path: str, 
    crawler: AsyncWebCrawler, 
    log_callback: Callable[[str], None], 
    check_cancel: Callable[[], bool], 
    progress_callback: Optional[Callable[[int, int], None]] = None
) -> None:
    result = await crawler.arun(url=url, bypass_cache=True)
    if not result.success: 
        return
    
    html = result.html
    matches = re.findall(r'data-src=["\']https://[^"\']+)["\']', html)
    
    if matches:
        images = list(dict.fromkeys(matches))  # Remove duplicates
        log_callback(f"[INFO] Downloading {len(images)} images...")
        
        await download_and_make_pdf(
            images, 
            output_pdf_path, 
            config.HEADERS_M440, 
            log_callback, 
            check_cancel, 
            progress_callback, 
            is_path=True,
            open_result=config.OPEN_RESULT_ON_FINISH
        )

Location: ~/workspace/source/core/sites/m440.py:111-144

Deduplication

The handler uses dict.fromkeys() to remove duplicate URLs while preserving order:

images = list(dict.fromkeys(matches))

This is more efficient than:

images = list(set(matches))  # Order not preserved

Headers and configuration

M440 requires specific headers for image downloads:

HEADERS_M440 = {
    "User-Agent": "Mozilla/5.0 ...",
    "Referer": "https://m440.in/"
}

Defined in config.py and passed to the download utility.

Usage examples

Single chapter

from core.handler import process_url

await process_url(
    "https://m440.in/manga/one-piece/chapter-1000",
    log_callback=print,
    check_cancel=lambda: False
)

Output: PDF/m440_chapter.pdf

Full series

await process_url(
    "https://m440.in/manga/one-piece",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"Chapter {current}/{total}")
)

Output:

PDF/
└── One Piece/
    ├── One Piece - chapter-1.pdf
    ├── One Piece - chapter-2.pdf
    └── One Piece - chapter-3.pdf

Via web interface

Launch: START_WEB_VERSION.bat
Open: http://localhost:3000
Paste URL: https://m440.in/manga/my-manga
Watch progress in real-time

Implementation details

Class structure

class M440Handler(BaseSiteHandler):
    @staticmethod
    def get_supported_domains() -> list:
        return ["m440.in", "mangas.in"]
    
    async def process(self, url, log_callback, check_cancel, progress_callback):
        """Process M440.in URL."""
        ...
    
    async def _process_chapter(self, url, output_pdf_path, crawler, ...):
        """Helper to process a single chapter."""
        ...

Location: ~/workspace/source/core/sites/m440.py:15-145

Reusing crawler instance

For series downloads, the handler reuses the same AsyncWebCrawler instance:

async with AsyncWebCrawler(verbose=True) as crawler:
    result = await crawler.arun(url=url, bypass_cache=True)
    # ... cover page processing ...
    
    for i, chap_url in enumerate(clean_links):
        await self._process_chapter(
            chap_url, 
            full_pdf_path, 
            crawler,  # Reuse instance
            ...
        )

Location: ~/workspace/source/core/sites/m440.py:32,81 This improves performance by avoiding repeated browser initialization.

Known limitations

No lazy loading script

Unlike ZonaTMO and TMO-H, M440 doesn’t execute JavaScript to trigger lazy loading. This works because M440’s data-src attributes are present in the initial HTML.

Chapter naming

Chapters are named using the URL slug:

pdf_name = f"{manga_title} - {chap_url.split('/')[-1]}.pdf"

Location: ~/workspace/source/core/sites/m440.py:76 For example:

URL: https://m440.in/manga/one-piece/chapter-1000
Output: One Piece - chapter-1000.pdf

This preserves the site’s chapter naming convention.

Skip existing files

The handler skips already-downloaded chapters:

if os.path.exists(full_pdf_path): 
    continue

Location: ~/workspace/source/core/sites/m440.py:79

This allows you to resume interrupted series downloads without re-downloading existing chapters.

Performance characteristics

Speed: Very fast (no AI processing)
Reliability: High (simple regex)
Resource usage: Low (minimal CPU/memory)
Best for: Large series downloads

Comparison with other handlers

Handler	Extraction	Speed	API Required
M440	Regex	Fast	No
ZonaTMO	AI + Regex	Medium	Yes
TMO-H	AI + Regex	Medium	Yes
H2R	JSON	Fastest	No

Troubleshooting

No chapters found

If cover detection fails:

Verify the URL is correct
Check if the site structure has changed
Manually inspect the page HTML
Try a different manga series

Missing images

If some images don’t download:

Check if headers are correct in config.py
Verify the regex pattern matches data-src format
Some images may require additional authentication

Auto-open not working

On Linux/Mac, os.startfile() is not available:

if config.OPEN_RESULT_ON_FINISH:
    try: 
        os.startfile(pdf_dir)
    except: 
        pass

Location: ~/workspace/source/core/sites/m440.py:83-85 You’ll need to manually open the output folder.

Next steps

Hentai2Read

Even faster JSON-based extraction

ZonaTMO

Compare with AI-powered approach

Configuration

Configure headers and paths

Utils

Learn about download_and_make_pdf

Get Started

Deployment

Supported Sites

Core Concepts

Supported domains

URL patterns

Extraction technology

Simple Crawl4AI

Cover detection

Series download

Title extraction

Chapter extraction

Batch processing

Image extraction

Deduplication

Headers and configuration

Usage examples

Single chapter

Full series

Via web interface

Implementation details

Class structure

Reusing crawler instance

Known limitations

No lazy loading script

Chapter naming

Skip existing files

Performance characteristics

Comparison with other handlers

Troubleshooting

No chapters found

Missing images

Auto-open not working

Next steps

Hentai2Read

ZonaTMO

Configuration

Utils

Build docs developers (and LLMs) love

Get Started

Deployment

Supported Sites

Core Concepts

​Supported domains

​URL patterns

​Extraction technology

​Simple Crawl4AI

​Cover detection

​Series download

​Title extraction

​Chapter extraction

​Batch processing

​Image extraction

​Deduplication

​Headers and configuration

​Usage examples

​Single chapter

​Full series

​Via web interface

​Implementation details

​Class structure

​Reusing crawler instance

​Known limitations

​No lazy loading script

​Chapter naming

​Skip existing files

​Performance characteristics

​Comparison with other handlers

​Troubleshooting

​No chapters found

​Missing images

​Auto-open not working

​Next steps

Hentai2Read

ZonaTMO

Configuration

Utils

Build docs developers (and LLMs) love

Supported domains

URL patterns

Extraction technology

Simple Crawl4AI

Cover detection

Series download

Title extraction

Chapter extraction

Batch processing

Image extraction

Deduplication

Headers and configuration

Usage examples

Single chapter

Full series

Via web interface

Implementation details

Class structure

Reusing crawler instance

Known limitations

No lazy loading script

Chapter naming

Skip existing files

Performance characteristics

Comparison with other handlers

Troubleshooting

No chapters found

Missing images

Auto-open not working

Next steps