System Architecture Overview

System Architecture

Web Scraping Hub is a full-stack web application designed to provide streaming and catalog functionality for movies, series, and anime content. The system follows a client-server architecture with clear separation between frontend and backend components.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                        Client Layer                          │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  React Frontend (Vite + TypeScript)                  │  │
│  │  - React Router for navigation                       │  │
│  │  - TanStack Query for state management               │  │
│  │  - TailwindCSS for styling                          │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕ HTTP/REST
┌─────────────────────────────────────────────────────────────┐
│                        Server Layer                          │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Flask Backend (Python)                              │  │
│  │  - RESTful API endpoints                            │  │
│  │  - CORS enabled                                      │  │
│  │  - Static file serving                              │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│                      Processing Layer                        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Extractor System                                    │  │
│  │  - Generic Extractor (listings & info)              │  │
│  │  - Series Extractor (episodes & metadata)           │  │
│  │  - IFrame Extractor (player URLs)                   │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕
┌─────────────────────────────────────────────────────────────┐
│                       External Layer                         │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  HTTP Client (CloudScraper)                         │  │
│  │  - Cloudflare bypass                                │  │
│  │  - Ad blocking                                      │  │
│  │  - HTML/JSON fetching                               │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Tech Stack

Backend Technologies

Flask

Python web framework for building the REST API

CloudScraper

HTTP client with Cloudflare bypass capabilities

BeautifulSoup4

HTML parsing and web scraping library

AdblockParser

Ad blocking rules for clean HTML extraction

Frontend Technologies

React 18

Modern UI library with hooks and Suspense

TypeScript

Type-safe JavaScript for better developer experience

TanStack Query

Data fetching and caching solution

React Router

Client-side routing with lazy loading

TailwindCSS

Utility-first CSS framework

Vite

Fast build tool and dev server

Data Flow

Request Flow

User Interaction: User navigates to a catalog page or searches for content
Frontend Request: React component triggers API call via TanStack Query hook
Backend Processing: Flask receives request and routes to appropriate handler
Data Extraction: Extractor modules fetch and parse external content
Response: Processed data returned as JSON to frontend
UI Update: React components re-render with new data

Example: Movie Catalog Flow

Key Design Principles

Separation of Concerns

The architecture maintains clear boundaries between layers:

Presentation Layer: React components handle UI rendering
Business Logic Layer: Flask routes and extractors process data
Data Access Layer: HTTP client handles external requests

Modularity

Each component is self-contained and can be modified independently:

Extractors are modular and can be extended for new sources
Frontend hooks are organized by domain (API, UI, utils)
Backend utilities are separated by function

Performance Optimization

Lazy Loading: Frontend pages load on demand
Caching: TanStack Query caches API responses (5 min stale time)
Image Optimization: Lazy loading for catalog images
Preloading: First catalog image preloaded for better LCP

Error Handling

Backend returns consistent error responses
Frontend uses Error Boundaries for graceful degradation
Retry logic built into TanStack Query (3 retries)

Deployment Architecture

The application supports multiple deployment modes:

Development Mode

Backend runs on port 1234
Frontend dev server on port 5173
CORS enabled for cross-origin requests

Production Mode

Backend serves both API and static frontend files
Single port (1234) for entire application
Docker support for containerized deployment

Docker Deployment

# Multi-architecture support
- AMD64
- ARM64
- ARMv7

The system is optimized for deployment on CasaOS but works on any Docker-compatible platform.

Configuration

Configuration is centralized in backend/config.py:

backend/config.py

APP_VERSION = "1.4.8"
BASE_URL = "https://sololatino.net"

TARGET_URLS = [
    {"nombre": "Películas", "url": f"{BASE_URL}/peliculas"},
    {"nombre": "Series", "url": f"{BASE_URL}/series"},
    {"nombre": "Anime", "url": f"{BASE_URL}/animes"},
    # ... more sections
]

Security Considerations

CORS: Configured to allow frontend communication
Ad Blocking: EasyList rules prevent malicious scripts
Rate Limiting: CloudScraper handles anti-bot protections
Input Validation: URL parameters sanitized before processing

Scalability

Current architecture supports:

Horizontal scaling via Docker containers
Caching layer can be added (Redis/Memcached)
Database integration possible for user data
CDN integration for static assets

Next Steps

Explore detailed documentation for each architectural component:

Get Started

Installation

Core Features

Architecture

Configuration