Skip to main content

Introduction to Web Scrapping Hub

Web Scrapping Hub is a full-stack streaming application that aggregates movies, series, and anime content from external sources through robust web scraping. Built with Flask (Python) and React, it provides a modern interface for browsing and streaming content with seamless Cloudflare protection bypass.

What is Web Scrapping Hub?

Web Scrapping Hub transforms unstructured HTML from external streaming sites into a structured, browsable catalog. The application extracts metadata, episode listings, and streaming links, presenting them through a responsive React interface—all served from a single containerized application on port 1234.
Single-Port Architecture: Both the React SPA and Flask API are served on port 1234, eliminating the need for separate web servers or reverse proxies.

Core Architecture

The application implements a three-tier architecture within a single Docker container:
1

Presentation Layer

React 18 SPA with Vite build system, TailwindCSS styling, and React Router for clean URLs (/page/1, /page/2)
2

Application Logic Layer

Flask RESTful API with modular endpoints for catalog browsing, search, series episodes, and video player extraction
3

Data Extraction Layer

Modular extractor system using BeautifulSoup4 and cloudscraper to bypass Cloudflare and parse HTML into structured JSON

Architecture Diagram

The system processes user requests through a unified Flask application:
User Request → Flask App (Port 1234)
                    ├── React SPA (Frontend)
                    └── REST API (Backend)
                            └── Extractors (cloudscraper + BeautifulSoup)
                                    └── External Sites

Key Features

Content Aggregation

Multiple Content Types

  • Movies (Latino)
  • TV Series with episode tracking
  • Anime series and films
  • K-Dramas and cartoons
  • Content from Netflix, HBO, Disney+, Amazon, and more

Rich Metadata

  • Titles, posters, and thumbnails
  • Release dates and genres
  • Episode listings with season organization
  • Synopsis and descriptions

Web Scraping Engine

Cloudflare Bypass: Web Scrapping Hub uses cloudscraper to overcome Cloudflare protection, ensuring reliable content extraction even from protected sites.
The scraping engine consists of three specialized extractors:
  • Generic Extractor (generic_extractor.py): Parses catalog listings and movie metadata
  • Serie Extractor (serie_extractor.py): Extracts episode listings organized by season
  • Iframe Extractor (iframe_extractor.py): Locates and extracts video player URLs

User Experience Features

Clean Pagination

Navigate catalogs with clean URLs (/page/2) instead of query parameters

Advanced Search

  • Quick search via API endpoint
  • Deep search functionality
  • Filter by section (Movies, Series, Anime, etc.)

Integrated Video Player

  • Modal playback with episode navigation
  • Footer controls with synopsis display
  • Rounded buttons with hover effects

Responsive Design

TailwindCSS-powered responsive interface that works on desktop and mobile

Technology Stack

Backend Technologies

TechnologyPurposeVersion
FlaskRESTful API serverLatest
Flask-CORSCross-origin resource sharingLatest
cloudscraperHTTP client with Cloudflare bypassLatest
BeautifulSoup4HTML parsing and extractionLatest
PythonRuntime environment3.11+

Frontend Technologies

TechnologyPurposeVersion
ReactUI framework18.3.1
ViteBuild system and dev server5.4.2
React RouterClient-side routing7.6.3
TanStack QueryData fetching and caching5.83.0
TailwindCSSUtility-first CSS framework3.4.1
Lucide ReactIcon library0.344.0

Infrastructure

# Stage 1: Build frontend
FROM node:20-alpine AS frontend-build
WORKDIR /app/frontend/project
COPY ../frontend/project/ ./
RUN npm install && npm run build

# Stage 2: Backend + serve frontend
FROM python:3.11-alpine
WORKDIR /app
COPY ../backend/ ./backend/
COPY --from=frontend-build /app/frontend/project/dist ./frontend/dist
RUN pip install -r ./backend/requirements.txt
EXPOSE 1234
CMD ["python", "-m", "backend.app"]

Deployment Models

Web Scrapping Hub supports multiple deployment scenarios:

API Endpoints Overview

The Flask backend exposes several RESTful endpoints:
EndpointMethodPurpose
/api/versionGETCheck app version and updates
/api/seccionesGETGet available content sections
/api/listadoGETGet catalog listings with pagination
/api/deep-searchGETDeep search functionality
/api/serie/<slug>GETGet series episodes by season
/api/pelicula/<slug>GETGet movie details and player
/api/anime/<slug>GETGet anime episodes
/api/iframe_playerGETExtract iframe player URL
All endpoints return JSON responses. See the API Reference for detailed documentation.

Content Sections

The application aggregates content from multiple categories:

Movies

Latino movies from sololatino.net

Series

TV series with episode tracking

Anime

Anime series and movies

K-Drama

Korean drama series

Cartoons

Animated series and shows

Streaming Platforms

Netflix, HBO Max, Disney+, Amazon Prime, Apple TV, Hulu

Modular Extractor System

Web Scrapping Hub uses a modular extractor architecture for maintainability and extensibility:

Generic Extractor

Extracts catalog listings from HTML:
def extraer_listado(html):
    soup = BeautifulSoup(html, 'html.parser')
    articulos = soup.select('article.item')
    datos = []
    for articulo in articulos:
        # Extract poster, title, image, year, genres, etc.
        datos.append({
            "id": id_post,
            "slug": slug,
            "titulo": titulo,
            "imagen": imagen,
            "year": year,
            "generos": generos,
            "tipo": tipo
        })
    return datos

Serie Extractor

Extracts episode listings organized by season:
def extraer_episodios_serie(url):
    html = fetch_html(url)
    soup = BeautifulSoup(html, 'html.parser')
    temporadas_divs = soup.select('#seasons .se-c')
    episodios_data = []
    for temporada_div in temporadas_divs:
        num_temporada = int(temporada_div.get('data-season', 0))
        episodios = temporada_div.select('li')
        # Extract episode details
    return {"info": info, "episodios": episodios_data}

Iframe Extractor

Locates video player iframes for streaming:
def extraer_iframe_reproductor(url):
    # Extract iframe URLs from movie/episode pages
    return {
        "player_url": url,
        "fuente": source,
        "formato": format
    }

Next Steps

Now that you understand the architecture, here’s how to proceed:

Quick Start

Get Web Scrapping Hub running in minutes

Local Setup

Install for local development

Docker Deployment

Deploy with Docker

API Reference

Explore the REST API
Legal Notice: Web Scrapping Hub is for educational purposes. Ensure compliance with terms of service of any sites you scrape.

Build docs developers (and LLMs) love