The Hitomi.la handler uses advanced browser automation with Playwright to bypass anti-bot protections and download high-quality images page-by-page.
Supported domains
@ staticmethod
def get_supported_domains () -> list :
return [ "hitomi.la" ]
Location: ~/workspace/source/core/sites/hitomi.py:19-21
Supported URLs
Hitomi URLs come in two formats:
Gallery reader (preferred)
https://hitomi.la/reader/[gallery-id].html
Gallery info page
https://hitomi.la/[type]/[title-slug]-[gallery-id].html
Types: doujinshi, manga, galleries, cg, etc.
The handler extracts the numeric ID from either format:
id_match = re.search( r ' [ -/ ]( \d + ) \. html' , url)
if not id_match:
log_callback( "[ERROR] Could not extract ID from URL." )
return
gallery_id = int (id_match.group( 1 ))
Location: ~/workspace/source/core/sites/hitomi.py:34-37
Why browser automation?
Hitomi.la implements aggressive anti-bot protection:
403 Forbidden on direct image requests without proper headers
404 Not Found if referrer is missing or incorrect
Dynamic image URLs that change based on browser state
JavaScript-required reader interface
Standard HTTP requests and Crawl4AI fail on Hitomi. Only full browser automation works reliably.
Playwright stealth mode
The handler uses Playwright with stealth techniques:
from playwright.async_api import async_playwright
args = [
"--no-sandbox" ,
"--disable-setuid-sandbox" ,
"--start-maximized"
]
browser = await p.chromium.launch(
headless = is_headless,
args = args
)
context = await browser.new_context(
user_agent = config. USER_AGENT ,
viewport = { 'width' : 1280 , 'height' : 720 }
)
Location: ~/workspace/source/core/sites/hitomi.py:57-64
Headless detection
The handler automatically determines if it should run headless:
is_headless = os.getenv( "HEADLESS" , "false" ).lower() == "true" or not os.getenv( "DISPLAY" )
if os.name == 'nt' : # Windows
is_headless = False
Location: ~/workspace/source/core/sites/hitomi.py:53-54
Linux servers : Defaults to headless
Windows : Always visible (better success rate)
Override : Set HEADLESS=true in .env
Unlike other handlers that extract all URLs at once, Hitomi downloads images one by one:
reader_url = f "https://hitomi.la/reader/ { gallery_id } .html#1"
await page.goto(reader_url, wait_until = "domcontentloaded" )
# Get total images from JavaScript variable
total_images = await page.evaluate(
"() => window.galleryinfo ? window.galleryinfo.files.length : 0"
)
for i in range ( 1 , total_images + 1 ):
# Update hash to go to next image
await page.evaluate( f "location.hash = '# { i } '" )
# Wait for image to update
await page.wait_for_function(
"""(selector) => {
const img = document.querySelector(selector);
return img && img.src && img.src.indexOf('http') === 0;
}""" ,
arg = "div#comicImages img" ,
timeout = 10000
)
# Extract image info
img_info = await page.evaluate( """(selector) => {
const img = document.querySelector(selector);
return {
src: img.src,
width: img.naturalWidth,
height: img.naturalHeight
};
}""" , "div#comicImages img" )
Location: ~/workspace/source/core/sites/hitomi.py:69-122
Why page-by-page?
Dynamic URLs : Image URLs are generated per page via JavaScript
Protection : Bulk requests trigger rate limiting
Quality : Ensures high-quality images (not thumbnails)
Reliability : Handles navigation errors per-page
Downloading with proper referrer
Hitomi requires the reader URL as referrer:
img_src = img_info[ 'src' ]
headers = { "Referer" : f "https://hitomi.la/reader/ { gallery_id } .html" }
response = await page.request.get(img_src, headers = headers)
if response.status == 200 :
data = await response.body()
ext = img_src.split( '.' )[ - 1 ]
if '?' in ext:
ext = ext.split( '?' )[ 0 ]
filename = f " { i :03d} . { ext } "
filepath = os.path.join(temp_dir, filename)
with open (filepath, 'wb' ) as f:
f.write(data)
download_targets.append(filepath)
Location: ~/workspace/source/core/sites/hitomi.py:124-141
Using page.request.get() instead of aiohttp ensures:
Cookies are maintained
Browser context is used
Referrer is properly set
Temporary file handling
Since images are downloaded sequentially, they’re stored in a temp folder first:
current_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath( __file__ ))))
temp_dir = os.path.join(current_dir, config. TEMP_FOLDER_NAME )
if os.path.exists(temp_dir):
shutil.rmtree(temp_dir)
os.makedirs(temp_dir, exist_ok = True )
download_targets = [] # Will store file paths
Location: ~/workspace/source/core/sites/hitomi.py:43-49
After all downloads complete:
if download_targets:
pdf_name = f " { clean_filename(title) } .pdf"
finalize_pdf_flow(
download_targets,
pdf_name,
log_callback,
temp_dir,
open_result = config. OPEN_RESULT_ON_FINISH
)
Location: ~/workspace/source/core/sites/hitomi.py:164-172
The finalize_pdf_flow utility:
Generates the PDF from downloaded files
Cleans up the temp directory
Opens the result if configured
The handler extracts the gallery title from the page:
title = f "Hitomi_ { gallery_id } "
page_title = await page.title()
if page_title:
clean_title = re.sub( r ' [ \\ /*?:"<>| ] ' , '' , page_title).strip()
title = clean_title if clean_title else title
log_callback( f "[INFO] Title detected: { title } " )
Location: ~/workspace/source/core/sites/hitomi.py:77-81
If title extraction fails, it defaults to Hitomi_{gallery_id}.
Fallback for unknown gallery size
Sometimes window.galleryinfo is not immediately available:
total_images = await page.evaluate(
"() => window.galleryinfo ? window.galleryinfo.files.length : 0"
)
if total_images == 0 :
log_callback( "[INFO] 'galleryinfo' not detected, trying fallback..." )
try :
await page.wait_for_function(
"() => window.galleryinfo && window.galleryinfo.files.length > 0" ,
timeout = 5000
)
total_images = await page.evaluate( "() => window.galleryinfo.files.length" )
except :
log_callback( "[WARN] Could not determine total images. Estimating..." )
total_images = 9999 # Arbitrary limit
Location: ~/workspace/source/core/sites/hitomi.py:84-94
If fallback also fails, the handler estimates 9999 pages and stops when errors occur:
except Exception as e:
log_callback( f "[ERROR] Error on page { i } : { e } " )
if total_images == 9999 and i > 5 :
log_callback( "[INFO] Possible end of gallery." )
break
Location: ~/workspace/source/core/sites/hitomi.py:150-154
Rate limiting and delays
The handler includes a 500ms delay between pages:
await page.wait_for_timeout( 500 )
Location: ~/workspace/source/core/sites/hitomi.py:148
This prevents:
Rate limiting by the server
Browser detection as bot
Connection errors
Usage examples
Single gallery download
from core.handler import process_url
await process_url(
"https://hitomi.la/reader/1234567.html" ,
log_callback = print ,
check_cancel = lambda : False ,
progress_callback = lambda current , total : print ( f "Page { current } / { total } " )
)
Output: PDF/Gallery Title.pdf
From info page
await process_url(
"https://hitomi.la/doujinshi/my-favorite-doujin-1234567.html" ,
log_callback = print ,
check_cancel = lambda : False
)
The handler extracts the ID 1234567 and navigates to the reader.
Via web interface
Launch: START_WEB_VERSION.bat
Open: http://localhost:3000
Paste URL: https://hitomi.la/reader/1234567.html
Watch real-time page-by-page progress
Hitomi downloads are slower due to page-by-page navigation, but this ensures maximum quality and bypasses protections.
Speed : Slow (~30-60 seconds for 50 images)
Reliability : Very high (bypasses all protections)
Resource usage : High (full browser instance)
Quality : Maximum (original resolution)
Comparison
Metric Hitomi H2R M440 Time (50 imgs) ~60s ~5s ~8s Memory ~500MB ~50MB ~80MB CPU High Low Low Reliability 99% 95% 90%
Known limitations
No series support
Hitomi handler only supports single galleries, not bulk series downloads.
Visible browser on Windows
For best reliability on Windows, the browser runs in visible mode:
if os.name == 'nt' : # Windows
is_headless = False
This can be distracting but significantly improves success rate.
Slow for large galleries
Galleries with 200+ images can take 5-10 minutes. Consider using progress callbacks:
progress_callback = lambda c , t : print ( f "Progress: { c } / { t } ( { c / t * 100 :.1f} %)" , end = " \r " )
Troubleshooting
Browser launch fails
If Playwright fails to launch:
playwright install chromium
Ensure Chromium is installed.
Timeout on wait_for_function
If image loading times out:
await page.wait_for_function( ... , timeout = 10000 ) # Increase timeout
Adjust the timeout in the code (currently 10 seconds).
403/404 errors
If downloads fail with 403/404:
Verify referrer is set correctly
Check if User-Agent is up-to-date
Try visible browser mode (HEADLESS=false)
Increase delays between requests
Images not loading
If window.galleryinfo is undefined:
await page.wait_for_timeout( 2000 ) # Wait longer
Increase initial wait time after page load.
Advanced configuration
Custom viewport
You can modify the viewport size:
context = await browser.new_context(
user_agent = config. USER_AGENT ,
viewport = { 'width' : 1920 , 'height' : 1080 } # Larger viewport
)
Location: ~/workspace/source/core/sites/hitomi.py:61-64
Additional stealth options
For extra stealth, add more browser args:
args = [
"--no-sandbox" ,
"--disable-setuid-sandbox" ,
"--start-maximized" ,
"--disable-blink-features=AutomationControlled" ,
"--disable-dev-shm-usage"
]
Next steps
NHentai Similar browser-based approach for NHentai
Configuration Configure headless mode and paths
Utils Learn about finalize_pdf_flow
Architecture Understand the handler system