The NHentai handler uses Playwright browser automation to fetch gallery metadata via API, bypassing Cloudflare protection, then constructs image URLs for fast downloads.
Supported domains
@ staticmethod
def get_supported_domains () -> list :
return [ "nhentai.net" ]
Location: ~/workspace/source/core/sites/nhentai.py:20-22
Supported URLs
NHentai gallery URLs:
https://nhentai.net/g/[gallery-id]
https://nhentai.net/g/[gallery-id]/[page-number] (page ignored)
Example:
https://nhentai.net/g/123456
https://nhentai.net/g/123456/1/
The handler extracts the numeric gallery ID:
id_match = re.search( r 'nhentai \. net/g/ ( \d + ) ' , url)
if not id_match:
log_callback( "[ERROR] Could not extract ID from URL." )
return
gallery_id = id_match.group( 1 )
Location: ~/workspace/source/core/sites/nhentai.py:32-36
Why browser automation?
Cloudflare protection
NHentai’s API is protected by Cloudflare:
Challenge pages for direct requests
Browser fingerprinting to detect bots
Cookie requirements for API access
JavaScript challenges that must be solved
Standard HTTP requests to nhentai.net/api/ return 403 Forbidden or Cloudflare challenge pages.
Hybrid approach: Browser + API
Unlike Hitomi (which downloads page-by-page), NHentai uses a hybrid approach:
Browser : Fetches API JSON (bypasses Cloudflare)
AsyncIO : Downloads images directly via constructed URLs
This combines:
Reliability of browser automation
Speed of direct HTTP downloads
API structure
NHentai exposes a JSON API for gallery metadata:
https://nhentai.net/api/gallery/[gallery-id]
Response structure:
{
"id" : 123456 ,
"media_id" : "1234567" ,
"title" : {
"english" : "Gallery Title" ,
"pretty" : "Gallery Title (Cleaned)"
},
"images" : {
"pages" : [
{ "t" : "j" , "w" : 1200 , "h" : 1800 },
{ "t" : "p" , "w" : 1200 , "h" : 1800 },
...
]
}
}
Location: API usage at ~/workspace/source/core/sites/nhentai.py:40-96
Browser-based API fetch
api_url = f "https://nhentai.net/api/gallery/ { gallery_id } "
async with async_playwright() as p:
is_headless = os.getenv( "HEADLESS" , "false" ).lower() == "true" or not os.getenv( "DISPLAY" )
if os.name == 'nt' :
is_headless = False
args = [
"--no-sandbox" ,
"--disable-setuid-sandbox" ,
"--disable-blink-features=AutomationControlled"
]
browser = await p.chromium.launch( headless = is_headless, args = args)
context = await browser.new_context( user_agent = config. USER_AGENT )
page = await context.new_page()
log_callback( "[INFO] Fetching metadata..." )
await page.goto(api_url, wait_until = "domcontentloaded" )
# Browser might wrap JSON in PRE tag or just text
content = await page.inner_text( "body" )
data = json.loads(content)
Location: ~/workspace/source/core/sites/nhentai.py:53-77
if "title" in data:
title = data[ "title" ].get(
"pretty" ,
data[ "title" ].get( "english" , title)
)
Location: ~/workspace/source/core/sites/nhentai.py:78-79
Preference order:
title.pretty (cleaned version)
title.english (fallback)
f"nhentai_{gallery_id}" (default)
Image URL construction
Extension mapping
ext_map = { 'j' : 'jpg' , 'p' : 'png' , 'w' : 'webp' }
for idx, img in enumerate (images_list):
t = img.get( 't' )
ext = ext_map.get(t, 'jpg' )
img_url = f "https://i.nhentai.net/galleries/ { media_id } / { idx + 1 } . { ext } "
images_data.append(img_url)
Location: ~/workspace/source/core/sites/nhentai.py:84-91
NHentai uses:
t: "j" → .jpg
t: "p" → .png
t: "w" → .webp
CDN structure
Images follow a predictable pattern:
https://i.nhentai.net/galleries/[media_id]/[page_number].[extension]
Example:
https://i.nhentai.net/galleries/1234567/1.jpg
https://i.nhentai.net/galleries/1234567/2.png
https://i.nhentai.net/galleries/1234567/3.jpg
This allows direct downloads without additional API calls.
Direct image download
After constructing URLs, images are downloaded via AsyncIO:
if images_data:
log_callback( f "[INFO] Gallery: { title } ( { len (images_data) } imgs)" )
headers = { "User-Agent" : config. USER_AGENT }
pdf_name = f " { clean_filename(title) } .pdf"
await download_and_make_pdf(
images_data,
pdf_name,
headers,
log_callback,
check_cancel,
progress_callback,
open_result = config. OPEN_RESULT_ON_FINISH
)
Location: ~/workspace/source/core/sites/nhentai.py:104-119
Note: Unlike Hitomi, NHentai images don’t require referrer headers.
Error handling
JSON parsing
try :
data = json.loads(content)
if "title" in data:
title = data[ "title" ].get( "pretty" , data[ "title" ].get( "english" , title))
media_id = data.get( "media_id" )
images_list = data.get( "images" , {}).get( "pages" , [])
# ... construct URLs ...
except json.JSONDecodeError:
preview = content[: 200 ] if content else "Empty content"
log_callback( f "[ERROR] Invalid JSON. Response: { preview } " )
return
Location: ~/workspace/source/core/sites/nhentai.py:76-96
If the API returns HTML (Cloudflare challenge) instead of JSON, the handler shows a preview of the response.
Browser errors
try :
log_callback( "[INFO] Fetching metadata..." )
await page.goto(api_url, wait_until = "domcontentloaded" )
content = await page.inner_text( "body" )
# ... parse JSON ...
except Exception as e:
log_callback( f "[ERROR] Error fetching metadata: { e } " )
return
finally :
await browser.close()
Location: ~/workspace/source/core/sites/nhentai.py:70-102
Usage examples
Single gallery download
from core.handler import process_url
await process_url(
"https://nhentai.net/g/123456" ,
log_callback = print ,
check_cancel = lambda : False ,
progress_callback = lambda current , total : print ( f " { current } / { total } " )
)
Output: PDF/Gallery Title.pdf
Via web interface
Start the server: START_WEB_VERSION.bat
Open: http://localhost:3000
Paste: https://nhentai.net/g/123456
Download completes in seconds
Via Discord bot
!descargar https://nhentai.net/g/123456
Bot uploads PDF or GoFile link if >8MB.
Speed : Fast (~10-15 seconds for 50 images)
Reliability : High (bypasses Cloudflare)
Resource usage : Medium (browser for API only)
Quality : Maximum (original CDN images)
Comparison with Hitomi
Metric NHentai Hitomi Browser usage API only Every page Time (50 imgs) ~15s ~60s Memory ~200MB ~500MB API calls 1 0 HTTP requests 50 50
NHentai is 4x faster because it only uses the browser for the initial API call.
Known limitations
Cloudflare updates
If Cloudflare updates its protection, the handler may need adjustments:
args = [
"--no-sandbox" ,
"--disable-setuid-sandbox" ,
"--disable-blink-features=AutomationControlled" # Anti-detection
]
Location: ~/workspace/source/core/sites/nhentai.py:58-62
Single gallery only
NHentai handler doesn’t support bulk downloads or series detection.
Requires browser
Unlike H2R or M440, NHentai requires Playwright:
playwright install chromium
Troubleshooting
”Invalid JSON” error
If you see:
[ERROR] Invalid JSON. Response: <html>...
The API returned HTML (likely Cloudflare challenge). Solutions:
Try visible browser mode: HEADLESS=false
Update User-Agent in config.py
Add more anti-detection args
Wait and retry (Cloudflare may be rate limiting)
Empty content
If content is empty:
content = await page.inner_text( "body" )
if not content:
log_callback( "[ERROR] Empty response from API" )
The page may not have loaded. Increase timeout:
await page.goto(api_url, wait_until = "networkidle" , timeout = 30000 )
404 on images
If image URLs return 404:
Verify media_id is correct
Check extension mapping
Some galleries may be deleted
CDN structure may have changed
Advanced configuration
You can add more headers for image downloads:
headers = {
"User-Agent" : config. USER_AGENT ,
"Accept" : "image/webp,image/apng,image/*,*/*;q=0.8" ,
"Accept-Language" : "en-US,en;q=0.9"
}
Proxy support
To use a proxy with Playwright:
context = await browser.new_context(
user_agent = config. USER_AGENT ,
proxy = {
"server" : "http://proxy.example.com:8080" ,
"username" : "user" ,
"password" : "pass"
}
)
Implementation details
Class structure
class NHentaiHandler ( BaseSiteHandler ):
"""Handler for nhentai.net website."""
@ staticmethod
def get_supported_domains () -> list :
return [ "nhentai.net" ]
async def process (
self ,
url : str ,
log_callback : Callable[[ str ], None ],
check_cancel : Callable[[], bool ],
progress_callback : Optional[Callable[[ int , int ], None ]] = None
) -> None :
"""Process nhentai.net URL."""
...
Location: ~/workspace/source/core/sites/nhentai.py:17-30
Why not use requests?
Direct HTTP requests fail:
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get(api_url) as resp:
# Returns Cloudflare challenge HTML, not JSON
Playwright is required to bypass Cloudflare’s JavaScript challenges.
Security considerations
Anti-detection measures
The handler includes several anti-detection features:
args = [
"--no-sandbox" ,
"--disable-setuid-sandbox" ,
"--disable-blink-features=AutomationControlled" # Hide automation
]
Location: ~/workspace/source/core/sites/nhentai.py:58-62
AutomationControlled feature removal prevents detection via:
if ( navigator . webdriver === true ) {
// Block automated browsers
}
Next steps
Hitomi.la Compare page-by-page vs API approach
Configuration Configure User-Agent and headless mode
Hentai2Read See a faster JSON-only approach
Architecture Learn about the handler system