Overview
ScrapeAccraProperties uses Playwright with Chromium to scrape JavaScript-rendered pages. The configuration prioritizes performance through resource blocking and optimized browser settings.Browser Type
Launch Options
Launch Arguments
- headless: Runs browser without GUI (background mode)
- —disable-gpu: Disables GPU hardware acceleration
- —disable-dev-shm-usage: Prevents shared memory issues in Docker/limited environments
- —no-sandbox: Disables Chrome sandboxing (required in containerized environments)
- —disable-setuid-sandbox: Additional sandbox bypass for compatibility
- —disable-extensions: Prevents Chrome extensions from loading
- —blink-settings=imagesEnabled=false: Disables image loading at the Blink engine level
Context Settings
Context Parameters
- user_agent: Mimics Chrome 122 on macOS to appear as a regular browser
- viewport: Sets window size to 1280x720 for consistent rendering
- ignore_https_errors: Bypasses SSL certificate validation errors
- bypass_csp: Disables Content Security Policy restrictions
- java_script_enabled: Enables JavaScript execution (required for target sites)
- accept_downloads: Blocks file downloads to prevent unwanted data transfer
Named Contexts
Four separate contexts are configured for isolation between spiders:jiji_urls- Jiji URL collection spiderjiji_listings- Jiji listing detail spidermeqasa_urls- Meqasa URL collection spidermeqasa_listings- Meqasa listing detail spider
Resource Blocking
- image: All images (PNG, JPG, GIF, WebP, etc.)
- media: Video and audio files
- font: Web fonts (WOFF, TTF, etc.)
- stylesheet: CSS files
- other: Miscellaneous resources
Max Contexts and Pages
- PLAYWRIGHT_MAX_CONTEXTS: Maximum of 10 browser contexts (isolated sessions)
- PLAYWRIGHT_MAX_PAGES_PER_CONTEXT: Up to 12 pages (tabs) per context
Navigation Timeout
Platform-Specific Settings
WindowsSelectorEventLoopPolicy for compatibility with Playwright’s async operations.