Introduction to Web Scrapping Hub

Web Scrapping Hub is a full-stack streaming application that aggregates movies, series, and anime content from external sources through robust web scraping. Built with Flask (Python) and React, it provides a modern interface for browsing and streaming content with seamless Cloudflare protection bypass.

What is Web Scrapping Hub?

Web Scrapping Hub transforms unstructured HTML from external streaming sites into a structured, browsable catalog. The application extracts metadata, episode listings, and streaming links, presenting them through a responsive React interface—all served from a single containerized application on port 1234.

Single-Port Architecture: Both the React SPA and Flask API are served on port 1234, eliminating the need for separate web servers or reverse proxies.

Core Architecture

The application implements a three-tier architecture within a single Docker container:

Presentation Layer

React 18 SPA with Vite build system, TailwindCSS styling, and React Router for clean URLs (/page/1, /page/2)

Application Logic Layer

Flask RESTful API with modular endpoints for catalog browsing, search, series episodes, and video player extraction

Data Extraction Layer

Modular extractor system using BeautifulSoup4 and cloudscraper to bypass Cloudflare and parse HTML into structured JSON

Architecture Diagram

The system processes user requests through a unified Flask application:

User Request → Flask App (Port 1234)
                    ├── React SPA (Frontend)
                    └── REST API (Backend)
                            └── Extractors (cloudscraper + BeautifulSoup)
                                    └── External Sites

Key Features

Content Aggregation

Multiple Content Types

Movies (Latino)
TV Series with episode tracking
Anime series and films
K-Dramas and cartoons
Content from Netflix, HBO, Disney+, Amazon, and more

Rich Metadata

Titles, posters, and thumbnails
Release dates and genres
Episode listings with season organization
Synopsis and descriptions

Web Scraping Engine

Cloudflare Bypass: Web Scrapping Hub uses cloudscraper to overcome Cloudflare protection, ensuring reliable content extraction even from protected sites.

The scraping engine consists of three specialized extractors:

Generic Extractor (generic_extractor.py): Parses catalog listings and movie metadata
Serie Extractor (serie_extractor.py): Extracts episode listings organized by season
Iframe Extractor (iframe_extractor.py): Locates and extracts video player URLs

User Experience Features

Clean Pagination

Navigate catalogs with clean URLs (/page/2) instead of query parameters

Advanced Search

Quick search via API endpoint
Deep search functionality
Filter by section (Movies, Series, Anime, etc.)

Integrated Video Player

Modal playback with episode navigation
Footer controls with synopsis display
Rounded buttons with hover effects

Responsive Design

TailwindCSS-powered responsive interface that works on desktop and mobile

Technology Stack

Backend Technologies

Technology	Purpose	Version
Flask	RESTful API server	Latest
Flask-CORS	Cross-origin resource sharing	Latest
cloudscraper	HTTP client with Cloudflare bypass	Latest
BeautifulSoup4	HTML parsing and extraction	Latest
Python	Runtime environment	3.11+

Frontend Technologies

Technology	Purpose	Version
React	UI framework	18.3.1
Vite	Build system and dev server	5.4.2
React Router	Client-side routing	7.6.3
TanStack Query	Data fetching and caching	5.83.0
TailwindCSS	Utility-first CSS framework	3.4.1
Lucide React	Icon library	0.344.0

Infrastructure

# Stage 1: Build frontend
FROM node:20-alpine AS frontend-build
WORKDIR /app/frontend/project
COPY ../frontend/project/ ./
RUN npm install && npm run build

# Stage 2: Backend + serve frontend
FROM python:3.11-alpine
WORKDIR /app
COPY ../backend/ ./backend/
COPY --from=frontend-build /app/frontend/project/dist ./frontend/dist
RUN pip install -r ./backend/requirements.txt
EXPOSE 1234
CMD ["python", "-m", "backend.app"]

Deployment Models

Web Scrapping Hub supports multiple deployment scenarios:

Docker (Recommended)
CasaOS
Local Development

Single-container deployment optimized for home servers and CasaOS.

Multi-architecture support (AMD64, ARM64, ARMv7)
Self-contained with frontend and backend
Exposes port 1234 for all services

docker build -t peliculas-casaos .
docker run -d --name peliculas -p 1234:1234 peliculas-casaos

API Endpoints Overview

The Flask backend exposes several RESTful endpoints:

Endpoint	Method	Purpose
`/api/version`	GET	Check app version and updates
`/api/secciones`	GET	Get available content sections
`/api/listado`	GET	Get catalog listings with pagination
`/api/deep-search`	GET	Deep search functionality
`/api/serie/<slug>`	GET	Get series episodes by season
`/api/pelicula/<slug>`	GET	Get movie details and player
`/api/anime/<slug>`	GET	Get anime episodes
`/api/iframe_player`	GET	Extract iframe player URL

All endpoints return JSON responses. See the API Reference for detailed documentation.

Content Sections

The application aggregates content from multiple categories:

Movies

Latino movies from sololatino.net

Series

TV series with episode tracking

Anime

Anime series and movies

K-Drama

Korean drama series

Cartoons

Animated series and shows

Streaming Platforms

Netflix, HBO Max, Disney+, Amazon Prime, Apple TV, Hulu

Modular Extractor System

Web Scrapping Hub uses a modular extractor architecture for maintainability and extensibility:

Generic Extractor

Extracts catalog listings from HTML:

def extraer_listado(html):
    soup = BeautifulSoup(html, 'html.parser')
    articulos = soup.select('article.item')
    datos = []
    for articulo in articulos:
        # Extract poster, title, image, year, genres, etc.
        datos.append({
            "id": id_post,
            "slug": slug,
            "titulo": titulo,
            "imagen": imagen,
            "year": year,
            "generos": generos,
            "tipo": tipo
        })
    return datos

Serie Extractor

Extracts episode listings organized by season:

def extraer_episodios_serie(url):
    html = fetch_html(url)
    soup = BeautifulSoup(html, 'html.parser')
    temporadas_divs = soup.select('#seasons .se-c')
    episodios_data = []
    for temporada_div in temporadas_divs:
        num_temporada = int(temporada_div.get('data-season', 0))
        episodios = temporada_div.select('li')
        # Extract episode details
    return {"info": info, "episodios": episodios_data}

Iframe Extractor

Locates video player iframes for streaming:

def extraer_iframe_reproductor(url):
    # Extract iframe URLs from movie/episode pages
    return {
        "player_url": url,
        "fuente": source,
        "formato": format
    }

Next Steps

Now that you understand the architecture, here’s how to proceed:

Quick Start

Get Web Scrapping Hub running in minutes

Local Setup

Install for local development

Docker Deployment

Deploy with Docker

API Reference

Explore the REST API

Legal Notice: Web Scrapping Hub is for educational purposes. Ensure compliance with terms of service of any sites you scrape.

Get Started

Installation

Core Features

Architecture

Configuration

​Introduction to Web Scrapping Hub

​What is Web Scrapping Hub?

​Core Architecture

​Architecture Diagram

​Key Features

​Content Aggregation

Multiple Content Types

Rich Metadata

​Web Scraping Engine

​User Experience Features

Clean Pagination

Advanced Search

Integrated Video Player

Responsive Design

​Technology Stack

​Backend Technologies

​Frontend Technologies

​Infrastructure

​Deployment Models

​API Endpoints Overview

​Content Sections

Movies

Series

Anime

K-Drama

Cartoons

Streaming Platforms

​Modular Extractor System

​Generic Extractor

​Serie Extractor

​Iframe Extractor

​Next Steps

Quick Start

Local Setup

Docker Deployment

API Reference

Build docs developers (and LLMs) love

Introduction to Web Scrapping Hub

What is Web Scrapping Hub?

Core Architecture

Architecture Diagram

Key Features

Content Aggregation

Web Scraping Engine

User Experience Features

Technology Stack

Backend Technologies

Frontend Technologies

Infrastructure

Deployment Models

API Endpoints Overview

Content Sections

Modular Extractor System

Generic Extractor

Serie Extractor

Iframe Extractor

Next Steps