Introduction to Web Scrapping Hub
Web Scrapping Hub is a full-stack streaming application that aggregates movies, series, and anime content from external sources through robust web scraping. Built with Flask (Python) and React, it provides a modern interface for browsing and streaming content with seamless Cloudflare protection bypass.What is Web Scrapping Hub?
Web Scrapping Hub transforms unstructured HTML from external streaming sites into a structured, browsable catalog. The application extracts metadata, episode listings, and streaming links, presenting them through a responsive React interface—all served from a single containerized application on port 1234.Single-Port Architecture: Both the React SPA and Flask API are served on port 1234, eliminating the need for separate web servers or reverse proxies.
Core Architecture
The application implements a three-tier architecture within a single Docker container:Presentation Layer
React 18 SPA with Vite build system, TailwindCSS styling, and React Router for clean URLs (
/page/1, /page/2)Application Logic Layer
Flask RESTful API with modular endpoints for catalog browsing, search, series episodes, and video player extraction
Architecture Diagram
The system processes user requests through a unified Flask application:Key Features
Content Aggregation
Multiple Content Types
- Movies (Latino)
- TV Series with episode tracking
- Anime series and films
- K-Dramas and cartoons
- Content from Netflix, HBO, Disney+, Amazon, and more
Rich Metadata
- Titles, posters, and thumbnails
- Release dates and genres
- Episode listings with season organization
- Synopsis and descriptions
Web Scraping Engine
Cloudflare Bypass: Web Scrapping Hub uses
cloudscraper to overcome Cloudflare protection, ensuring reliable content extraction even from protected sites.- Generic Extractor (
generic_extractor.py): Parses catalog listings and movie metadata - Serie Extractor (
serie_extractor.py): Extracts episode listings organized by season - Iframe Extractor (
iframe_extractor.py): Locates and extracts video player URLs
User Experience Features
Clean Pagination
Navigate catalogs with clean URLs (
/page/2) instead of query parametersAdvanced Search
- Quick search via API endpoint
- Deep search functionality
- Filter by section (Movies, Series, Anime, etc.)
Integrated Video Player
- Modal playback with episode navigation
- Footer controls with synopsis display
- Rounded buttons with hover effects
Responsive Design
TailwindCSS-powered responsive interface that works on desktop and mobile
Technology Stack
Backend Technologies
| Technology | Purpose | Version |
|---|---|---|
| Flask | RESTful API server | Latest |
| Flask-CORS | Cross-origin resource sharing | Latest |
| cloudscraper | HTTP client with Cloudflare bypass | Latest |
| BeautifulSoup4 | HTML parsing and extraction | Latest |
| Python | Runtime environment | 3.11+ |
Frontend Technologies
| Technology | Purpose | Version |
|---|---|---|
| React | UI framework | 18.3.1 |
| Vite | Build system and dev server | 5.4.2 |
| React Router | Client-side routing | 7.6.3 |
| TanStack Query | Data fetching and caching | 5.83.0 |
| TailwindCSS | Utility-first CSS framework | 3.4.1 |
| Lucide React | Icon library | 0.344.0 |
Infrastructure
Deployment Models
Web Scrapping Hub supports multiple deployment scenarios:- Docker (Recommended)
- CasaOS
- Local Development
Single-container deployment optimized for home servers and CasaOS.
- Multi-architecture support (AMD64, ARM64, ARMv7)
- Self-contained with frontend and backend
- Exposes port 1234 for all services
API Endpoints Overview
The Flask backend exposes several RESTful endpoints:| Endpoint | Method | Purpose |
|---|---|---|
/api/version | GET | Check app version and updates |
/api/secciones | GET | Get available content sections |
/api/listado | GET | Get catalog listings with pagination |
/api/deep-search | GET | Deep search functionality |
/api/serie/<slug> | GET | Get series episodes by season |
/api/pelicula/<slug> | GET | Get movie details and player |
/api/anime/<slug> | GET | Get anime episodes |
/api/iframe_player | GET | Extract iframe player URL |
All endpoints return JSON responses. See the API Reference for detailed documentation.
Content Sections
The application aggregates content from multiple categories:Movies
Latino movies from sololatino.net
Series
TV series with episode tracking
Anime
Anime series and movies
K-Drama
Korean drama series
Cartoons
Animated series and shows
Streaming Platforms
Netflix, HBO Max, Disney+, Amazon Prime, Apple TV, Hulu
Modular Extractor System
Web Scrapping Hub uses a modular extractor architecture for maintainability and extensibility:Generic Extractor
Extracts catalog listings from HTML:Serie Extractor
Extracts episode listings organized by season:Iframe Extractor
Locates video player iframes for streaming:Next Steps
Now that you understand the architecture, here’s how to proceed:Quick Start
Get Web Scrapping Hub running in minutes
Local Setup
Install for local development
Docker Deployment
Deploy with Docker
API Reference
Explore the REST API