Hitomi.la - Universal Manga Downloader

The Hitomi.la handler uses advanced browser automation with Playwright to bypass anti-bot protections and download high-quality images page-by-page.

Supported domains

@staticmethod
def get_supported_domains() -> list:
    return ["hitomi.la"]

Location: ~/workspace/source/core/sites/hitomi.py:19-21

Supported URLs

Hitomi URLs come in two formats:

Gallery reader (preferred)

https://hitomi.la/reader/[gallery-id].html

Gallery info page

https://hitomi.la/[type]/[title-slug]-[gallery-id].html

Types: doujinshi, manga, galleries, cg, etc. The handler extracts the numeric ID from either format:

id_match = re.search(r'[-/](\d+)\.html', url)
if not id_match:
    log_callback("[ERROR] Could not extract ID from URL.")
    return
gallery_id = int(id_match.group(1))

Location: ~/workspace/source/core/sites/hitomi.py:34-37

Why browser automation?

Hitomi.la implements aggressive anti-bot protection:

403 Forbidden on direct image requests without proper headers
404 Not Found if referrer is missing or incorrect
Dynamic image URLs that change based on browser state
JavaScript-required reader interface

Standard HTTP requests and Crawl4AI fail on Hitomi. Only full browser automation works reliably.

Extraction technology

Playwright stealth mode

The handler uses Playwright with stealth techniques:

from playwright.async_api import async_playwright

args = [
    "--no-sandbox", 
    "--disable-setuid-sandbox", 
    "--start-maximized"
]

browser = await p.chromium.launch(
    headless=is_headless, 
    args=args
)

context = await browser.new_context(
    user_agent=config.USER_AGENT,
    viewport={'width': 1280, 'height': 720}
)

Location: ~/workspace/source/core/sites/hitomi.py:57-64

Headless detection

The handler automatically determines if it should run headless:

is_headless = os.getenv("HEADLESS", "false").lower() == "true" or not os.getenv("DISPLAY")
if os.name == 'nt':  # Windows
    is_headless = False

Location: ~/workspace/source/core/sites/hitomi.py:53-54

Linux servers: Defaults to headless
Windows: Always visible (better success rate)
Override: Set HEADLESS=true in .env

Page-by-page extraction

Unlike other handlers that extract all URLs at once, Hitomi downloads images one by one:

reader_url = f"https://hitomi.la/reader/{gallery_id}.html#1"
await page.goto(reader_url, wait_until="domcontentloaded")

# Get total images from JavaScript variable
total_images = await page.evaluate(
    "() => window.galleryinfo ? window.galleryinfo.files.length : 0"
)

for i in range(1, total_images + 1):
    # Update hash to go to next image
    await page.evaluate(f"location.hash = '#{i}'")
    
    # Wait for image to update
    await page.wait_for_function(
        """(selector) => {
            const img = document.querySelector(selector);
            return img && img.src && img.src.indexOf('http') === 0;
        }""", 
        arg="div#comicImages img", 
        timeout=10000
    )
    
    # Extract image info
    img_info = await page.evaluate("""(selector) => {
        const img = document.querySelector(selector);
        return {
            src: img.src, 
            width: img.naturalWidth, 
            height: img.naturalHeight
        };
    }""", "div#comicImages img")

Location: ~/workspace/source/core/sites/hitomi.py:69-122

Why page-by-page?

Dynamic URLs: Image URLs are generated per page via JavaScript
Protection: Bulk requests trigger rate limiting
Quality: Ensures high-quality images (not thumbnails)
Reliability: Handles navigation errors per-page

Downloading with proper referrer

Hitomi requires the reader URL as referrer:

img_src = img_info['src']
headers = {"Referer": f"https://hitomi.la/reader/{gallery_id}.html"}

response = await page.request.get(img_src, headers=headers)

if response.status == 200:
    data = await response.body()
    ext = img_src.split('.')[-1]
    if '?' in ext: 
        ext = ext.split('?')[0]
    
    filename = f"{i:03d}.{ext}"
    filepath = os.path.join(temp_dir, filename)
    
    with open(filepath, 'wb') as f:
        f.write(data)
    
    download_targets.append(filepath)

Location: ~/workspace/source/core/sites/hitomi.py:124-141 Using page.request.get() instead of aiohttp ensures:

Cookies are maintained
Browser context is used
Referrer is properly set

Temporary file handling

Since images are downloaded sequentially, they’re stored in a temp folder first:

current_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
temp_dir = os.path.join(current_dir, config.TEMP_FOLDER_NAME)

if os.path.exists(temp_dir):
    shutil.rmtree(temp_dir)
os.makedirs(temp_dir, exist_ok=True)

download_targets = []  # Will store file paths

Location: ~/workspace/source/core/sites/hitomi.py:43-49 After all downloads complete:

if download_targets:
    pdf_name = f"{clean_filename(title)}.pdf"
    finalize_pdf_flow(
        download_targets, 
        pdf_name, 
        log_callback, 
        temp_dir,
        open_result=config.OPEN_RESULT_ON_FINISH
    )

Location: ~/workspace/source/core/sites/hitomi.py:164-172 The finalize_pdf_flow utility:

Generates the PDF from downloaded files
Cleans up the temp directory
Opens the result if configured

Title extraction

The handler extracts the gallery title from the page:

title = f"Hitomi_{gallery_id}"
page_title = await page.title()

if page_title:
    clean_title = re.sub(r'[\\/*?:"<>|]', '', page_title).strip()
    title = clean_title if clean_title else title

log_callback(f"[INFO] Title detected: {title}")

Location: ~/workspace/source/core/sites/hitomi.py:77-81 If title extraction fails, it defaults to Hitomi_{gallery_id}.

Fallback for unknown gallery size

Sometimes window.galleryinfo is not immediately available:

total_images = await page.evaluate(
    "() => window.galleryinfo ? window.galleryinfo.files.length : 0"
)

if total_images == 0:
    log_callback("[INFO] 'galleryinfo' not detected, trying fallback...")
    try:
        await page.wait_for_function(
            "() => window.galleryinfo && window.galleryinfo.files.length > 0", 
            timeout=5000
        )
        total_images = await page.evaluate("() => window.galleryinfo.files.length")
    except:
        log_callback("[WARN] Could not determine total images. Estimating...")
        total_images = 9999  # Arbitrary limit

Location: ~/workspace/source/core/sites/hitomi.py:84-94 If fallback also fails, the handler estimates 9999 pages and stops when errors occur:

except Exception as e:
    log_callback(f"[ERROR] Error on page {i}: {e}")
    if total_images == 9999 and i > 5:
        log_callback("[INFO] Possible end of gallery.")
        break

Location: ~/workspace/source/core/sites/hitomi.py:150-154

Rate limiting and delays

The handler includes a 500ms delay between pages:

await page.wait_for_timeout(500)

Location: ~/workspace/source/core/sites/hitomi.py:148 This prevents:

Rate limiting by the server
Browser detection as bot
Connection errors

Usage examples

Single gallery download

from core.handler import process_url

await process_url(
    "https://hitomi.la/reader/1234567.html",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"Page {current}/{total}")
)

Output: PDF/Gallery Title.pdf

From info page

await process_url(
    "https://hitomi.la/doujinshi/my-favorite-doujin-1234567.html",
    log_callback=print,
    check_cancel=lambda: False
)

The handler extracts the ID 1234567 and navigates to the reader.

Via web interface

Launch: START_WEB_VERSION.bat
Open: http://localhost:3000
Paste URL: https://hitomi.la/reader/1234567.html
Watch real-time page-by-page progress

Hitomi downloads are slower due to page-by-page navigation, but this ensures maximum quality and bypasses protections.

Performance characteristics

Speed: Slow (~30-60 seconds for 50 images)
Reliability: Very high (bypasses all protections)
Resource usage: High (full browser instance)
Quality: Maximum (original resolution)

Comparison

Metric	Hitomi	H2R	M440
Time (50 imgs)	~60s	~5s	~8s
Memory	~500MB	~50MB	~80MB
CPU	High	Low	Low
Reliability	99%	95%	90%

Known limitations

No series support

Hitomi handler only supports single galleries, not bulk series downloads.

Visible browser on Windows

For best reliability on Windows, the browser runs in visible mode:

if os.name == 'nt':  # Windows
    is_headless = False

This can be distracting but significantly improves success rate.

Slow for large galleries

Galleries with 200+ images can take 5-10 minutes. Consider using progress callbacks:

progress_callback=lambda c, t: print(f"Progress: {c}/{t} ({c/t*100:.1f}%)", end="\r")

Troubleshooting

Browser launch fails

If Playwright fails to launch:

playwright install chromium

Ensure Chromium is installed.

Timeout on wait_for_function

If image loading times out:

await page.wait_for_function(..., timeout=10000)  # Increase timeout

Adjust the timeout in the code (currently 10 seconds).

403/404 errors

If downloads fail with 403/404:

Verify referrer is set correctly
Check if User-Agent is up-to-date
Try visible browser mode (HEADLESS=false)
Increase delays between requests

Images not loading

If window.galleryinfo is undefined:

await page.wait_for_timeout(2000)  # Wait longer

Increase initial wait time after page load.

Advanced configuration

Custom viewport

You can modify the viewport size:

context = await browser.new_context(
    user_agent=config.USER_AGENT,
    viewport={'width': 1920, 'height': 1080}  # Larger viewport
)

Location: ~/workspace/source/core/sites/hitomi.py:61-64

Additional stealth options

For extra stealth, add more browser args:

args = [
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--start-maximized",
    "--disable-blink-features=AutomationControlled",
    "--disable-dev-shm-usage"
]

Next steps

NHentai

Similar browser-based approach for NHentai

Configuration

Configure headless mode and paths

Utils

Learn about finalize_pdf_flow

Architecture

Understand the handler system

Get Started

Deployment

Supported Sites

Core Concepts

​Supported domains

​Supported URLs

​Gallery reader (preferred)

​Gallery info page

​Why browser automation?

​Extraction technology

​Playwright stealth mode

​Headless detection

​Page-by-page extraction

​Why page-by-page?

​Downloading with proper referrer

​Temporary file handling

​Title extraction

​Fallback for unknown gallery size

​Rate limiting and delays

​Usage examples

​Single gallery download

​From info page

​Via web interface

​Performance characteristics

​Comparison

​Known limitations

​No series support

​Visible browser on Windows

​Slow for large galleries

​Troubleshooting

​Browser launch fails

​Timeout on wait_for_function

​403/404 errors

​Images not loading

​Advanced configuration

​Custom viewport

​Additional stealth options

​Next steps

NHentai

Configuration

Utils

Architecture

Build docs developers (and LLMs) love

Supported domains

Supported URLs

Gallery reader (preferred)

Gallery info page

Why browser automation?

Extraction technology

Playwright stealth mode

Headless detection

Page-by-page extraction

Why page-by-page?

Downloading with proper referrer

Temporary file handling

Title extraction

Fallback for unknown gallery size

Rate limiting and delays

Usage examples

Single gallery download

From info page

Via web interface

Performance characteristics

Comparison

Known limitations

No series support

Visible browser on Windows

Slow for large galleries

Troubleshooting

Browser launch fails

Timeout on wait_for_function

403/404 errors

Images not loading

Advanced configuration

Custom viewport

Additional stealth options

Next steps