IMDb Scraper

A distributed web scraping system with advanced anti-blocking techniques, clean architecture, and hybrid persistence for extracting IMDb Top 250 movies.

Get Started Explore Architecture

Quick Start

Get up and running with the IMDb Scraper in just a few steps.

Clone the repository

Clone the IMDb Scraper project to your local machine:

git clone https://github.com/FrankDevg/imbd_scrapper_project.git
cd imbd_scrapper_project

Configure environment variables

Create a .env file with your configuration. See the environment variables guide for details:

POSTGRES_DB=imdb_scraper
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password

Launch with Docker

Start all services including PostgreSQL, TOR proxy, and the scraper:

docker-compose up

The scraper will automatically start collecting movie data from IMDb Top 250 and persist it to both CSV files and PostgreSQL.

View the results

Check the collected data in the data/ directory or query the PostgreSQL database directly on localhost:5432.

Key Features

Built with professional-grade architecture and advanced scraping techniques.

Clean Architecture

Domain-Driven Design with clear separation of concerns, testable business logic, and dependency inversion.

Network Evasion

Multi-layer protection with VPN, rotating proxies, TOR network, and exponential backoff retry logic.

Hybrid Persistence

Dual storage strategy with PostgreSQL for relational queries and CSV exports for data portability.

Concurrent Scraping

ThreadPoolExecutor-based parallel execution for efficient data collection at scale.

Advanced SQL Analytics

Window functions, CTEs, and complex queries for deep insights into movie data.

Docker Orchestration

Fully containerized environment with orchestrated services for reproducible deployments.

Explore by Topic

Dive deeper into specific areas of the system.

Architecture

Learn about the Clean Architecture and DDD patterns used throughout the project.

Core Features

Explore the scraping engine, network evasion, persistence, and concurrency features.

Data & SQL

Understand the database schema, analytical queries, and CSV export functionality.

Deployment

Deploy the scraper with Docker, configure environment variables, and set up networking.

Domain Models

Reference documentation for Movie, Actor, and other domain entities.

API Reference

Complete API documentation for all interfaces, use cases, and implementations.

Ready to Get Started?

Follow our quickstart guide to have the IMDb Scraper running in minutes, or explore the architecture to understand how it works under the hood.

View Quickstart View on GitHub