Multi-Layer Architecture
The network stack provides redundancy through sequential fallback mechanisms:Layer 1: VPN Setup
Docker Configuration
The scraper runs behind a ProtonVPN container using theqmcgaw/gluetun image:
docker-compose.yml:34
Network Isolation
The scraper container connects to both the VPN network and the application network:docker-compose.yml:49
Layer 2: Proxy Rotation
ProxyProvider Implementation
TheProxyProvider class manages premium proxy connections:
infrastructure/network/proxy_provider.py:24
Proxy Configuration
Proxies are configured via environment variables:shared/config/config.py:31
IP Validation
The proxy provider validates IP changes using ipinfo.io:infrastructure/network/proxy_provider.py:60
Layer 3: TOR Network
TOR Docker Service
The TOR proxy runs as a dedicated service:docker-compose.yml:22
TOR Rotator Implementation
TheTorRotator class manages IP rotation using the stem library:
infrastructure/network/tor_rotator.py:42
IP Rotation with Validation
infrastructure/network/tor_rotator.py:61
TOR Configuration
shared/config/config.py:36
Exponential Backoff Retry Logic
Request Utility with Fallback
Themake_request function implements intelligent retry logic:
infrastructure/scraper/utils.py:21
User-Agent Rotation
Random User-Agent Selection
infrastructure/scraper/utils.py:14
User-Agent Pool
shared/config/config.py:23
Configuration Options
Network Settings
Environment Variables
Required variables in.env:
Strategy Selection Logic
The scraper automatically selects the best strategy:- If
USE_CUSTOM_PROXY=True: Uses premium proxy only - If
USE_CUSTOM_PROXY=False: Uses TOR network - On proxy failure: Automatically falls back to TOR
- On all failures: Uses direct connection through VPN
Next Steps
Scraping Engine
Learn how the scraping engine works
Concurrency
Explore parallel processing implementation