Skip to main content
The NHentai handler uses Playwright browser automation to fetch gallery metadata via API, bypassing Cloudflare protection, then constructs image URLs for fast downloads.

Supported domains

@staticmethod
def get_supported_domains() -> list:
    return ["nhentai.net"]
Location: ~/workspace/source/core/sites/nhentai.py:20-22

Supported URLs

NHentai gallery URLs:
https://nhentai.net/g/[gallery-id]
https://nhentai.net/g/[gallery-id]/[page-number]  (page ignored)
Example:
https://nhentai.net/g/123456
https://nhentai.net/g/123456/1/
The handler extracts the numeric gallery ID:
id_match = re.search(r'nhentai\.net/g/(\d+)', url)
if not id_match:
    log_callback("[ERROR] Could not extract ID from URL.")
    return
gallery_id = id_match.group(1)
Location: ~/workspace/source/core/sites/nhentai.py:32-36

Why browser automation?

Cloudflare protection

NHentai’s API is protected by Cloudflare:
  • Challenge pages for direct requests
  • Browser fingerprinting to detect bots
  • Cookie requirements for API access
  • JavaScript challenges that must be solved
Standard HTTP requests to nhentai.net/api/ return 403 Forbidden or Cloudflare challenge pages.

Extraction technology

Hybrid approach: Browser + API

Unlike Hitomi (which downloads page-by-page), NHentai uses a hybrid approach:
  1. Browser: Fetches API JSON (bypasses Cloudflare)
  2. AsyncIO: Downloads images directly via constructed URLs
This combines:
  • Reliability of browser automation
  • Speed of direct HTTP downloads

API structure

NHentai exposes a JSON API for gallery metadata:
https://nhentai.net/api/gallery/[gallery-id]
Response structure:
{
  "id": 123456,
  "media_id": "1234567",
  "title": {
    "english": "Gallery Title",
    "pretty": "Gallery Title (Cleaned)"
  },
  "images": {
    "pages": [
      {"t": "j", "w": 1200, "h": 1800},
      {"t": "p", "w": 1200, "h": 1800},
      ...
    ]
  }
}
Location: API usage at ~/workspace/source/core/sites/nhentai.py:40-96

Metadata extraction

Browser-based API fetch

api_url = f"https://nhentai.net/api/gallery/{gallery_id}"

async with async_playwright() as p:
    is_headless = os.getenv("HEADLESS", "false").lower() == "true" or not os.getenv("DISPLAY")
    if os.name == 'nt': 
        is_headless = False
    
    args = [
        "--no-sandbox", 
        "--disable-setuid-sandbox",
        "--disable-blink-features=AutomationControlled" 
    ]
    
    browser = await p.chromium.launch(headless=is_headless, args=args)
    context = await browser.new_context(user_agent=config.USER_AGENT)
    page = await context.new_page()
    
    log_callback("[INFO] Fetching metadata...")
    await page.goto(api_url, wait_until="domcontentloaded")
    
    # Browser might wrap JSON in PRE tag or just text
    content = await page.inner_text("body")
    
    data = json.loads(content)
Location: ~/workspace/source/core/sites/nhentai.py:53-77

Title extraction

if "title" in data:
    title = data["title"].get(
        "pretty", 
        data["title"].get("english", title)
    )
Location: ~/workspace/source/core/sites/nhentai.py:78-79 Preference order:
  1. title.pretty (cleaned version)
  2. title.english (fallback)
  3. f"nhentai_{gallery_id}" (default)

Image URL construction

Extension mapping

ext_map = {'j': 'jpg', 'p': 'png', 'w': 'webp'}

for idx, img in enumerate(images_list):
    t = img.get('t')
    ext = ext_map.get(t, 'jpg')
    img_url = f"https://i.nhentai.net/galleries/{media_id}/{idx+1}.{ext}"
    images_data.append(img_url)
Location: ~/workspace/source/core/sites/nhentai.py:84-91 NHentai uses:
  • t: "j".jpg
  • t: "p".png
  • t: "w".webp

CDN structure

Images follow a predictable pattern:
https://i.nhentai.net/galleries/[media_id]/[page_number].[extension]
Example:
https://i.nhentai.net/galleries/1234567/1.jpg
https://i.nhentai.net/galleries/1234567/2.png
https://i.nhentai.net/galleries/1234567/3.jpg
This allows direct downloads without additional API calls.

Direct image download

After constructing URLs, images are downloaded via AsyncIO:
if images_data:
    log_callback(f"[INFO] Gallery: {title} ({len(images_data)} imgs)")
    
    headers = {"User-Agent": config.USER_AGENT}
    
    pdf_name = f"{clean_filename(title)}.pdf"
    await download_and_make_pdf(
        images_data, 
        pdf_name, 
        headers, 
        log_callback, 
        check_cancel, 
        progress_callback,
        open_result=config.OPEN_RESULT_ON_FINISH
    )
Location: ~/workspace/source/core/sites/nhentai.py:104-119 Note: Unlike Hitomi, NHentai images don’t require referrer headers.

Error handling

JSON parsing

try:
    data = json.loads(content)
    if "title" in data:
        title = data["title"].get("pretty", data["title"].get("english", title))
    
    media_id = data.get("media_id")
    images_list = data.get("images", {}).get("pages", [])
    # ... construct URLs ...
    
except json.JSONDecodeError:
    preview = content[:200] if content else "Empty content"
    log_callback(f"[ERROR] Invalid JSON. Response: {preview}")
    return
Location: ~/workspace/source/core/sites/nhentai.py:76-96 If the API returns HTML (Cloudflare challenge) instead of JSON, the handler shows a preview of the response.

Browser errors

try:
    log_callback("[INFO] Fetching metadata...")
    await page.goto(api_url, wait_until="domcontentloaded")
    content = await page.inner_text("body")
    # ... parse JSON ...
    
except Exception as e:
    log_callback(f"[ERROR] Error fetching metadata: {e}")
    return
finally:
    await browser.close()
Location: ~/workspace/source/core/sites/nhentai.py:70-102

Usage examples

from core.handler import process_url

await process_url(
    "https://nhentai.net/g/123456",
    log_callback=print,
    check_cancel=lambda: False,
    progress_callback=lambda current, total: print(f"{current}/{total}")
)
Output: PDF/Gallery Title.pdf

Via web interface

  1. Start the server: START_WEB_VERSION.bat
  2. Open: http://localhost:3000
  3. Paste: https://nhentai.net/g/123456
  4. Download completes in seconds

Via Discord bot

!descargar https://nhentai.net/g/123456
Bot uploads PDF or GoFile link if >8MB.

Performance characteristics

  • Speed: Fast (~10-15 seconds for 50 images)
  • Reliability: High (bypasses Cloudflare)
  • Resource usage: Medium (browser for API only)
  • Quality: Maximum (original CDN images)

Comparison with Hitomi

MetricNHentaiHitomi
Browser usageAPI onlyEvery page
Time (50 imgs)~15s~60s
Memory~200MB~500MB
API calls10
HTTP requests5050
NHentai is 4x faster because it only uses the browser for the initial API call.

Known limitations

Cloudflare updates

If Cloudflare updates its protection, the handler may need adjustments:
args = [
    "--no-sandbox", 
    "--disable-setuid-sandbox",
    "--disable-blink-features=AutomationControlled"  # Anti-detection
]
Location: ~/workspace/source/core/sites/nhentai.py:58-62 NHentai handler doesn’t support bulk downloads or series detection.

Requires browser

Unlike H2R or M440, NHentai requires Playwright:
playwright install chromium

Troubleshooting

”Invalid JSON” error

If you see:
[ERROR] Invalid JSON. Response: <html>...
The API returned HTML (likely Cloudflare challenge). Solutions:
  1. Try visible browser mode: HEADLESS=false
  2. Update User-Agent in config.py
  3. Add more anti-detection args
  4. Wait and retry (Cloudflare may be rate limiting)

Empty content

If content is empty:
content = await page.inner_text("body")
if not content:
    log_callback("[ERROR] Empty response from API")
The page may not have loaded. Increase timeout:
await page.goto(api_url, wait_until="networkidle", timeout=30000)

404 on images

If image URLs return 404:
  1. Verify media_id is correct
  2. Check extension mapping
  3. Some galleries may be deleted
  4. CDN structure may have changed

Advanced configuration

Custom headers

You can add more headers for image downloads:
headers = {
    "User-Agent": config.USER_AGENT,
    "Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9"
}

Proxy support

To use a proxy with Playwright:
context = await browser.new_context(
    user_agent=config.USER_AGENT,
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Implementation details

Class structure

class NHentaiHandler(BaseSiteHandler):
    """Handler for nhentai.net website."""
    
    @staticmethod
    def get_supported_domains() -> list:
        return ["nhentai.net"]
    
    async def process(
        self,
        url: str,
        log_callback: Callable[[str], None],
        check_cancel: Callable[[], bool],
        progress_callback: Optional[Callable[[int, int], None]] = None
    ) -> None:
        """Process nhentai.net URL."""
        ...
Location: ~/workspace/source/core/sites/nhentai.py:17-30

Why not use requests?

Direct HTTP requests fail:
import aiohttp

async with aiohttp.ClientSession() as session:
    async with session.get(api_url) as resp:
        # Returns Cloudflare challenge HTML, not JSON
Playwright is required to bypass Cloudflare’s JavaScript challenges.

Security considerations

Anti-detection measures

The handler includes several anti-detection features:
args = [
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--disable-blink-features=AutomationControlled"  # Hide automation
]
Location: ~/workspace/source/core/sites/nhentai.py:58-62 AutomationControlled feature removal prevents detection via:
if (navigator.webdriver === true) {
    // Block automated browsers
}

Next steps

Hitomi.la

Compare page-by-page vs API approach

Configuration

Configure User-Agent and headless mode

Hentai2Read

See a faster JSON-only approach

Architecture

Learn about the handler system

Build docs developers (and LLMs) love