NHentai - Universal Manga Downloader

The NHentai handler uses Playwright browser automation to fetch gallery metadata via API, bypassing Cloudflare protection, then constructs image URLs for fast downloads.

Supported domains

@staticmethod
def get_supported_domains() -> list:
    return ["nhentai.net"]

Location: ~/workspace/source/core/sites/nhentai.py:20-22

Supported URLs

NHentai gallery URLs:

https://nhentai.net/g/[gallery-id]
https://nhentai.net/g/[gallery-id]/[page-number]  (page ignored)

Example:

https://nhentai.net/g/123456
https://nhentai.net/g/123456/1/

The handler extracts the numeric gallery ID:

id_match = re.search(r'nhentai\.net/g/(\d+)', url)
if not id_match:
    log_callback("[ERROR] Could not extract ID from URL.")
    return
gallery_id = id_match.group(1)

Location: ~/workspace/source/core/sites/nhentai.py:32-36

Why browser automation?

Cloudflare protection

NHentai’s API is protected by Cloudflare:

Challenge pages for direct requests
Browser fingerprinting to detect bots
Cookie requirements for API access
JavaScript challenges that must be solved

Standard HTTP requests to nhentai.net/api/ return 403 Forbidden or Cloudflare challenge pages.

Extraction technology

Hybrid approach: Browser + API

Unlike Hitomi (which downloads page-by-page), NHentai uses a hybrid approach:

Browser: Fetches API JSON (bypasses Cloudflare)
AsyncIO: Downloads images directly via constructed URLs

This combines:

Reliability of browser automation
Speed of direct HTTP downloads

API structure

NHentai exposes a JSON API for gallery metadata:

https://nhentai.net/api/gallery/[gallery-id]

Response structure:

{
  "id": 123456,
  "media_id": "1234567",
  "title": {
    "english": "Gallery Title",
    "pretty": "Gallery Title (Cleaned)"
  },
  "images": {
    "pages": [
      {"t": "j", "w": 1200, "h": 1800},
      {"t": "p", "w": 1200, "h": 1800},
      ...
    ]
  }
}

Location: API usage at ~/workspace/source/core/sites/nhentai.py:40-96

Metadata extraction

Browser-based API fetch

api_url = f"https://nhentai.net/api/gallery/{gallery_id}"

async with async_playwright() as p:
    is_headless = os.getenv("HEADLESS", "false").lower() == "true" or not os.getenv("DISPLAY")
    if os.name == 'nt': 
        is_headless = False
    
    args = [
        "--no-sandbox", 
        "--disable-setuid-sandbox",
        "--disable-blink-features=AutomationControlled" 
    ]
    
    browser = await p.chromium.launch(headless=is_headless, args=args)
    context = await browser.new_context(user_agent=config.USER_AGENT)
    page = await context.new_page()
    
    log_callback("[INFO] Fetching metadata...")
    await page.goto(api_url, wait_until="domcontentloaded")
    
    # Browser might wrap JSON in PRE tag or just text
    content = await page.inner_text("body")
    
    data = json.loads(content)

Location: ~/workspace/source/core/sites/nhentai.py:53-77

Title extraction

if "title" in data:
    title = data["title"].get(
        "pretty", 
        data["title"].get("english", title)
    )

Location: ~/workspace/source/core/sites/nhentai.py:78-79 Preference order:

title.pretty (cleaned version)
title.english (fallback)
f"nhentai_{gallery_id}" (default)

Image URL construction

Extension mapping

ext_map = {'j': 'jpg', 'p': 'png', 'w': 'webp'}

for idx, img in enumerate(images_list):
    t = img.get('t')
    ext = ext_map.get(t, 'jpg')
    img_url = f"https://i.nhentai.net/galleries/{media_id}/{idx+1}.{ext}"
    images_data.append(img_url)

Location: ~/workspace/source/core/sites/nhentai.py:84-91 NHentai uses:

t: "j" → .jpg
t: "p" → .png
t: "w" → .webp

CDN structure

Images follow a predictable pattern:

https://i.nhentai.net/galleries/[media_id]/[page_number].[extension]

Example:

https://i.nhentai.net/galleries/1234567/1.jpg
https://i.nhentai.net/galleries/1234567/2.png
https://i.nhentai.net/galleries/1234567/3.jpg

This allows direct downloads without additional API calls.

Direct image download

After constructing URLs, images are downloaded via AsyncIO:

if images_data:
    log_callback(f"[INFO] Gallery: {title} ({len(images_data)} imgs)")
    
    headers = {"User-Agent": config.USER_AGENT}
    
    pdf_name = f"{clean_filename(title)}.pdf"
    await download_and_make_pdf(
        images_data, 
        pdf_name, 
        headers, 
        log_callback, 
        check_cancel, 
        progress_callback,
        open_result=config.OPEN_RESULT_ON_FINISH
    )

Location: ~/workspace/source/core/sites/nhentai.py:104-119 Note: Unlike Hitomi, NHentai images don’t require referrer headers.

Error handling

JSON parsing

try:
    data = json.loads(content)
    if "title" in data:
        title = data["title"].get("pretty", data["title"].get("english", title))
    
    media_id = data.get("media_id")
    images_list = data.get("images", {}).get("pages", [])
    # ... construct URLs ...
    
except json.JSONDecodeError:
    preview = content[:200] if content else "Empty content"
    log_callback(f"[ERROR] Invalid JSON. Response: {preview}")
    return

Location: ~/workspace/source/core/sites/nhentai.py:76-96 If the API returns HTML (Cloudflare challenge) instead of JSON, the handler shows a preview of the response.

Browser errors

try:
    log_callback("[INFO] Fetching metadata...")
    await page.goto(api_url, wait_until="domcontentloaded")
    content = await page.inner_text("body")
    # ... parse JSON ...
    
except Exception as e:
    log_callback(f"[ERROR] Error fetching metadata: {e}")
    return
finally:
    await browser.close()

Location: ~/workspace/source/core/sites/nhentai.py:70-102

Usage examples

Single gallery download

from core.handler import process_url

await process_url(
    "https://nhentai.net/g/123456",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"{current}/{total}")
)

Output: PDF/Gallery Title.pdf

Via web interface

Start the server: START_WEB_VERSION.bat
Open: http://localhost:3000
Paste: https://nhentai.net/g/123456
Download completes in seconds

Via Discord bot

!descargar https://nhentai.net/g/123456

Bot uploads PDF or GoFile link if >8MB.

Performance characteristics

Speed: Fast (~10-15 seconds for 50 images)
Reliability: High (bypasses Cloudflare)
Resource usage: Medium (browser for API only)
Quality: Maximum (original CDN images)

Comparison with Hitomi

Metric	NHentai	Hitomi
Browser usage	API only	Every page
Time (50 imgs)	~15s	~60s
Memory	~200MB	~500MB
API calls	1	0
HTTP requests	50	50

NHentai is 4x faster because it only uses the browser for the initial API call.

Known limitations

Cloudflare updates

If Cloudflare updates its protection, the handler may need adjustments:

args = [
    "--no-sandbox", 
    "--disable-setuid-sandbox",
    "--disable-blink-features=AutomationControlled"  # Anti-detection
]

Location: ~/workspace/source/core/sites/nhentai.py:58-62

Single gallery only

NHentai handler doesn’t support bulk downloads or series detection.

Requires browser

Unlike H2R or M440, NHentai requires Playwright:

playwright install chromium

Troubleshooting

”Invalid JSON” error

If you see:

[ERROR] Invalid JSON. Response: <html>...

The API returned HTML (likely Cloudflare challenge). Solutions:

Try visible browser mode: HEADLESS=false
Update User-Agent in config.py
Add more anti-detection args
Wait and retry (Cloudflare may be rate limiting)

Empty content

If content is empty:

content = await page.inner_text("body")
if not content:
    log_callback("[ERROR] Empty response from API")

The page may not have loaded. Increase timeout:

await page.goto(api_url, wait_until="networkidle", timeout=30000)

404 on images

If image URLs return 404:

Verify media_id is correct
Check extension mapping
Some galleries may be deleted
CDN structure may have changed

Advanced configuration

Custom headers

You can add more headers for image downloads:

headers = {
    "User-Agent": config.USER_AGENT,
    "Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9"
}

Proxy support

To use a proxy with Playwright:

context = await browser.new_context(
    user_agent=config.USER_AGENT,
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Implementation details

Class structure

class NHentaiHandler(BaseSiteHandler):
    """Handler for nhentai.net website."""
    
    @staticmethod
    def get_supported_domains() -> list:
        return ["nhentai.net"]
    
    async def process(
        self,
        url: str,
        log_callback: Callable[[str], None],
        check_cancel: Callable[[], bool],
        progress_callback: Optional[Callable[[int, int], None]] = None
    ) -> None:
        """Process nhentai.net URL."""
        ...

Location: ~/workspace/source/core/sites/nhentai.py:17-30

Why not use requests?

Direct HTTP requests fail:

import aiohttp

async with aiohttp.ClientSession() as session:
    async with session.get(api_url) as resp:
        # Returns Cloudflare challenge HTML, not JSON

Playwright is required to bypass Cloudflare’s JavaScript challenges.

Security considerations

Anti-detection measures

The handler includes several anti-detection features:

args = [
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--disable-blink-features=AutomationControlled"  # Hide automation
]

Location: ~/workspace/source/core/sites/nhentai.py:58-62 AutomationControlled feature removal prevents detection via:

if (navigator.webdriver === true) {
    // Block automated browsers
}

Next steps

Hitomi.la

Compare page-by-page vs API approach

Configuration

Configure User-Agent and headless mode

Hentai2Read

See a faster JSON-only approach

Architecture

Learn about the handler system

Get Started

Deployment

Supported Sites

Core Concepts

​Supported domains

​Supported URLs

​Why browser automation?

​Cloudflare protection

​Extraction technology

​Hybrid approach: Browser + API

​API structure

​Metadata extraction

​Browser-based API fetch

​Title extraction

​Image URL construction

​Extension mapping

​CDN structure

​Direct image download

​Error handling

​JSON parsing

​Browser errors

​Usage examples

​Single gallery download

​Via web interface

​Via Discord bot

​Performance characteristics

​Comparison with Hitomi

​Known limitations

​Cloudflare updates

​Single gallery only

​Requires browser

​Troubleshooting

​”Invalid JSON” error

​Empty content

​404 on images

​Advanced configuration

​Custom headers

​Proxy support

​Implementation details

​Class structure

​Why not use requests?

​Security considerations

​Anti-detection measures

​Next steps

Hitomi.la

Configuration

Hentai2Read

Architecture

Build docs developers (and LLMs) love

Supported domains

Supported URLs

Why browser automation?

Cloudflare protection

Extraction technology

Hybrid approach: Browser + API

API structure

Metadata extraction

Browser-based API fetch

Title extraction

Image URL construction

Extension mapping

CDN structure

Direct image download

Error handling

JSON parsing

Browser errors

Usage examples

Single gallery download

Via web interface

Via Discord bot

Performance characteristics

Comparison with Hitomi

Known limitations

Cloudflare updates

Single gallery only

Requires browser

Troubleshooting

”Invalid JSON” error

Empty content

404 on images

Advanced configuration

Custom headers

Proxy support

Implementation details

Class structure

Why not use requests?

Security considerations

Anti-detection measures

Next steps