Skip to main content

Overview

Web Scraping Hub provides a powerful catalog browsing system that allows users to explore thousands of movies, series, and anime titles. The catalog is organized into sections with intelligent pagination and clean URL routing.

Catalog Sections

The platform organizes content into distinct sections for easy navigation:

Movies

Browse the latest movies in Latino format with comprehensive metadata

Series

Explore TV series with episodic content organized by seasons

Anime

Discover anime titles and movies with specialized categorization

Available Sections

The backend supports multiple content sections configured in config.py. Each section has:
  • Unique URL endpoint for content scraping
  • Section identifier for filtering and routing
  • Content type classification (movie, series, anime)
app.py:20-21
SECCIONES = {s['nombre']: s['url'] for s in TARGET_URLS}
SECCIONES_LIST = list(SECCIONES.keys())

Pagination System

Clean URL Routing

The platform uses clean, SEO-friendly URLs for pagination instead of query parameters:
http://localhost:1234/page/1
http://localhost:1234/page/2
http://localhost:1234/page/3

Backend Pagination Logic

The API handles page-based content retrieval with automatic URL construction:
app.py:114-130
pagina = int(request.args.get('pagina', 1))
if not seccion:
    seccion = SECCIONES_LIST[0]

url = SECCIONES[seccion_real]
if pagina > 1:
    if seccion_real != 'K-Drama':
        url = f"{url}/page/{pagina}"
    else:
        url = f"{url}page/{pagina}"
The pagination logic includes special handling for different section URL patterns, ensuring compatibility with various content sources.

Frontend Pagination Component

The Pagination component provides an intuitive navigation experience:
  • Smart page range display: Shows up to 5 page numbers at a time
  • Current page highlighting: Active page displayed with neon cyan styling
  • Previous/Next navigation: Arrow buttons with disabled state handling
  • Responsive design: Hides labels on mobile, shows icons only
  • Loading state support: Disables navigation during data fetching
Pagination.tsx:17-43
const getPageNumbers = () => {
  const pages = [];
  const maxVisible = 5;
  let start = 1;
  let end = totalPages;
  
  if (totalPages <= maxVisible) {
    start = 1;
    end = totalPages;
  } else {
    if (currentPage <= 3) {
      start = 1;
      end = 5;
    } else if (currentPage >= totalPages - 2) {
      start = totalPages - 4;
      end = totalPages;
    } else {
      start = currentPage - 2;
      end = currentPage + 2;
    }
  }
  // ...
};

Catalog Grid Display

Responsive Grid Layout

The catalog uses a fully responsive grid that adapts to different screen sizes:
Screen SizeColumnsBreakpoint
Mobile2Default
Small2sm:
Medium3md:
Large4lg:
XL5xl:
2XL62xl:
CatalogGrid.tsx:27
<div className="grid grid-cols-2 sm:grid-cols-2 md:grid-cols-3 lg:grid-cols-4 xl:grid-cols-5 2xl:grid-cols-6 gap-4">

Content Cards

Each catalog item displays as an interactive card with:
  • Poster image with 3:4 aspect ratio
  • Type badge (Movie/Series/Anime) with color coding
  • Hover overlay with play icon and metadata
  • Neon border effect on hover

Type-Based Color Coding

The platform uses distinct color themes for different content types:
CatalogGrid.tsx:48-56
${item.type === 'series'
  ? 'bg-electric-sky text-space-black border-electric-sky'
  : item.type === 'anime'
    ? 'bg-magenta-pink text-space-black border-magenta-pink'
    : 'bg-fuchsia-pink text-space-black border-fuchsia-pink'
}

Movies

Fuchsia Pink theme with matching glows

Series

Electric Sky (cyan) with neon effects

Anime

Magenta Pink for anime content

Loading States

The catalog includes skeleton loading states for smooth user experience:
CatalogGrid.tsx:12-24
if (loading) {
  return (
    <div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-4 xl:grid-cols-5 gap-4">
      {Array.from({ length: 20 }).map((_, index) => (
        <div key={index} className="animate-pulse">
          <div className="bg-gray-700 rounded-lg aspect-[3/4] mb-2"></div>
          <div className="h-4 bg-gray-700 rounded mb-2"></div>
          <div className="h-3 bg-gray-700 rounded w-3/4"></div>
        </div>
      ))}
    </div>
  );
}
The loading state displays 20 skeleton cards with pulsing animations, matching the typical catalog page size.

Data Extraction

Catalog data is extracted from web sources using the generic_extractor module:

Extraction Process

  1. HTML Fetching: Cloudflare-bypassed HTTP requests
  2. DOM Parsing: BeautifulSoup selects article elements
  3. Data Extraction: Metadata pulled from structured HTML
  4. Image Handling: Smart lazy-loading detection and fallbacks
  5. Type Classification: Automatic content type identification
generic_extractor.py:4-51
def extraer_listado(html):
    soup = BeautifulSoup(html, 'html.parser')
    articulos = soup.select('article.item')
    datos = []
    for articulo in articulos:
        # Extract poster, title, image, year, genres, language, type
        # Handle lazy-loading images with fallbacks
        # Classify content type from CSS classes

Image Optimization

The extractor includes sophisticated image handling:
1

Check Data Attributes

Looks for data-srcset, data-src, data-lazy-src first
2

Fallback to Noscript

If placeholder detected, searches for noscript tag with actual image
3

Clean Srcset

Removes responsive image parameters and extracts primary URL

Performance Optimizations

Lazy Loading Strategy

Only the first image loads eagerly; all others use native lazy loading:
CatalogGrid.tsx:45
{...(idx === 0 ? {} : { loading: 'lazy' })}

Query Caching

The frontend uses React Query for intelligent data caching:
  • Stale time: 2 minutes
  • Automatic refetching: On window focus
  • Background updates: Seamless data refresh

User Experience Features

Cards scale up 5% and display neon border with play icon overlay
All interactive elements use CSS transitions (300ms duration)
Pagination supports keyboard navigation for accessibility
Graceful fallbacks for missing images and failed extractions

API Integration

The catalog connects to the backend via the /api/listado endpoint:
app.py:79-141
@app.route('/api/listado', methods=['GET'])
def api_listado():
    busqueda = request.args.get('busqueda')
    seccion = request.args.get('seccion')
    pagina = int(request.args.get('pagina', 1))
    
    # Handle search or section browsing
    # Return catalog items with metadata
The endpoint supports both section browsing and search functionality through the same interface.

Best Practices

When working with the catalog system:
  1. Always handle loading states - Show skeletons during data fetch
  2. Implement error boundaries - Catch and display extraction failures
  3. Use type-based routing - Direct users to correct player pages
  4. Optimize images - Lazy load all but first visible images
  5. Cache responses - Reduce server load with client-side caching

Search

Learn about search and deep search functionality

Video Player

Explore the video player and streaming features

Build docs developers (and LLMs) love