Architecture Overview
The Docker environment consists of four main services:- postgres: PostgreSQL 15 database for structured data storage
- tor: TOR proxy for IP rotation and anonymity
- vpn: Gluetun VPN container for geolocation changes
- scraper: The main application container
Docker Compose Configuration
The complete orchestration is defined indocker-compose.yml:
Dockerfile Breakdown
The scraper container is built from thisDockerfile:
Key Components
Base Image
Base Image
Uses
python:3.11-slim for a lightweight Python environment with minimal footprint.System Dependencies
System Dependencies
- gcc: Required for compiling Python packages with C extensions
- libpq-dev: PostgreSQL development libraries for psycopg2
- tor: TOR network client
- netcat-openbsd: Network utility for health checks
- postgresql-client: CLI tools for database operations
Python Dependencies
Python Dependencies
Installed from
requirements.txt without cache to reduce image size.Service Dependencies
The scraper container has explicit dependencies:- Waits for PostgreSQL on port 5432
- Waits for TOR SOCKS proxy on port 9050
- Only starts scraping when both services are responsive
Volume Mounts
Database Persistence
pgdata:/var/lib/postgresql/data ensures PostgreSQL data survives container restartsSQL Initialization
./sql:/docker-entrypoint-initdb.d auto-runs SQL scripts on first startupApplication Code
.:/app mounts the entire project for development (can be removed in production)Build and Run
Initial Build
Start All Services
View Logs
Stop Services
Rebuild After Changes
Port Mappings
| Service | Internal Port | External Port | Purpose |
|---|---|---|---|
| postgres | 5432 | $ | Database connections |
| tor | 9050 | 9050 | SOCKS proxy traffic |
| tor | 9051 | 9051 | Control port (IP rotation) |
| vpn | 8888 | 8888 | VPN HTTP proxy |
Accessing Services
Troubleshooting
Common Issues
Database Connection FailedProduction Considerations
Remove Volume Mount
Change
- .:/app to only mount necessary files, not entire codebaseUse Secrets
Replace
.env file with Docker secrets or external secret managementHealth Checks
Add explicit
healthcheck directives to docker-compose.ymlResource Limits
Set memory and CPU limits for each service
Next Steps
Environment Variables
Configure database, proxy, and VPN credentials
Network Configuration
Understand Docker networks and proxy setup