Catalog Browsing

Overview

Web Scraping Hub provides a powerful catalog browsing system that allows users to explore thousands of movies, series, and anime titles. The catalog is organized into sections with intelligent pagination and clean URL routing.

Catalog Sections

The platform organizes content into distinct sections for easy navigation:

Movies

Browse the latest movies in Latino format with comprehensive metadata

Series

Explore TV series with episodic content organized by seasons

Anime

Discover anime titles and movies with specialized categorization

Available Sections

The backend supports multiple content sections configured in config.py. Each section has:

Unique URL endpoint for content scraping
Section identifier for filtering and routing
Content type classification (movie, series, anime)

app.py:20-21

SECCIONES = {s['nombre']: s['url'] for s in TARGET_URLS}
SECCIONES_LIST = list(SECCIONES.keys())

Pagination System

Clean URL Routing

The platform uses clean, SEO-friendly URLs for pagination instead of query parameters:

http://localhost:1234/page/1
http://localhost:1234/page/2
http://localhost:1234/page/3

Backend Pagination Logic

The API handles page-based content retrieval with automatic URL construction:

app.py:114-130

pagina = int(request.args.get('pagina', 1))
if not seccion:
    seccion = SECCIONES_LIST[0]

url = SECCIONES[seccion_real]
if pagina > 1:
    if seccion_real != 'K-Drama':
        url = f"{url}/page/{pagina}"
    else:
        url = f"{url}page/{pagina}"

The pagination logic includes special handling for different section URL patterns, ensuring compatibility with various content sources.

Frontend Pagination Component

The Pagination component provides an intuitive navigation experience:

Key Features

Smart page range display: Shows up to 5 page numbers at a time
Current page highlighting: Active page displayed with neon cyan styling
Previous/Next navigation: Arrow buttons with disabled state handling
Responsive design: Hides labels on mobile, shows icons only
Loading state support: Disables navigation during data fetching

Pagination.tsx:17-43

const getPageNumbers = () => {
  const pages = [];
  const maxVisible = 5;
  let start = 1;
  let end = totalPages;
  
  if (totalPages <= maxVisible) {
    start = 1;
    end = totalPages;
  } else {
    if (currentPage <= 3) {
      start = 1;
      end = 5;
    } else if (currentPage >= totalPages - 2) {
      start = totalPages - 4;
      end = totalPages;
    } else {
      start = currentPage - 2;
      end = currentPage + 2;
    }
  }
  // ...
};

Catalog Grid Display

Responsive Grid Layout

The catalog uses a fully responsive grid that adapts to different screen sizes:

Screen Size	Columns	Breakpoint
Mobile	2	Default
Small	2	sm:
Medium	3	md:
Large	4	lg:
XL	5	xl:
2XL	6	2xl:

CatalogGrid.tsx:27

<div className="grid grid-cols-2 sm:grid-cols-2 md:grid-cols-3 lg:grid-cols-4 xl:grid-cols-5 2xl:grid-cols-6 gap-4">

Content Cards

Each catalog item displays as an interactive card with:

Visual Elements
Metadata Display
Interactive Features

Poster image with 3:4 aspect ratio
Type badge (Movie/Series/Anime) with color coding
Hover overlay with play icon and metadata
Neon border effect on hover

Type-Based Color Coding

The platform uses distinct color themes for different content types:

CatalogGrid.tsx:48-56

${item.type === 'series'
  ? 'bg-electric-sky text-space-black border-electric-sky'
  : item.type === 'anime'
    ? 'bg-magenta-pink text-space-black border-magenta-pink'
    : 'bg-fuchsia-pink text-space-black border-fuchsia-pink'
}

Movies

Fuchsia Pink theme with matching glows

Series

Electric Sky (cyan) with neon effects

Anime

Magenta Pink for anime content

Loading States

The catalog includes skeleton loading states for smooth user experience:

CatalogGrid.tsx:12-24

if (loading) {
  return (
    <div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-4 xl:grid-cols-5 gap-4">
      {Array.from({ length: 20 }).map((_, index) => (
        <div key={index} className="animate-pulse">
          <div className="bg-gray-700 rounded-lg aspect-[3/4] mb-2"></div>
          <div className="h-4 bg-gray-700 rounded mb-2"></div>
          <div className="h-3 bg-gray-700 rounded w-3/4"></div>
        </div>
      ))}
    </div>
  );
}

The loading state displays 20 skeleton cards with pulsing animations, matching the typical catalog page size.

Data Extraction

Catalog data is extracted from web sources using the generic_extractor module:

Extraction Process

HTML Fetching: Cloudflare-bypassed HTTP requests
DOM Parsing: BeautifulSoup selects article elements
Data Extraction: Metadata pulled from structured HTML
Image Handling: Smart lazy-loading detection and fallbacks
Type Classification: Automatic content type identification

generic_extractor.py:4-51

def extraer_listado(html):
    soup = BeautifulSoup(html, 'html.parser')
    articulos = soup.select('article.item')
    datos = []
    for articulo in articulos:
        # Extract poster, title, image, year, genres, language, type
        # Handle lazy-loading images with fallbacks
        # Classify content type from CSS classes

Image Optimization

The extractor includes sophisticated image handling:

Check Data Attributes

Looks for data-srcset, data-src, data-lazy-src first

Fallback to Noscript

If placeholder detected, searches for noscript tag with actual image

Clean Srcset

Removes responsive image parameters and extracts primary URL

Performance Optimizations

Lazy Loading Strategy

Only the first image loads eagerly; all others use native lazy loading:

CatalogGrid.tsx:45

{...(idx === 0 ? {} : { loading: 'lazy' })}

Query Caching

The frontend uses React Query for intelligent data caching:

Stale time: 2 minutes
Automatic refetching: On window focus
Background updates: Seamless data refresh

User Experience Features

Hover Effects

Cards scale up 5% and display neon border with play icon overlay

Smooth Transitions

All interactive elements use CSS transitions (300ms duration)

Keyboard Navigation

Error Handling

Graceful fallbacks for missing images and failed extractions

API Integration

The catalog connects to the backend via the /api/listado endpoint:

app.py:79-141

@app.route('/api/listado', methods=['GET'])
def api_listado():
    busqueda = request.args.get('busqueda')
    seccion = request.args.get('seccion')
    pagina = int(request.args.get('pagina', 1))
    
    # Handle search or section browsing
    # Return catalog items with metadata

The endpoint supports both section browsing and search functionality through the same interface.

Best Practices

When working with the catalog system:

Always handle loading states - Show skeletons during data fetch
Implement error boundaries - Catch and display extraction failures
Use type-based routing - Direct users to correct player pages
Optimize images - Lazy load all but first visible images
Cache responses - Reduce server load with client-side caching

Search

Learn about search and deep search functionality

Video Player

Explore the video player and streaming features

Get Started

Installation

Core Features

Architecture

Configuration

Overview

Catalog Sections

Movies

Series

Anime

Available Sections

Clean URL Routing

Catalog Grid Display

Responsive Grid Layout

Content Cards

Type-Based Color Coding

Movies

Series

Anime

Loading States

Data Extraction

Extraction Process

Image Optimization

Performance Optimizations

Lazy Loading Strategy

Query Caching

User Experience Features

API Integration

Best Practices

Search

Video Player

Build docs developers (and LLMs) love

Get Started

Installation

Core Features

Architecture

Configuration

​Overview

​Catalog Sections

Movies

Series

Anime

​Available Sections

​Pagination System

​Clean URL Routing

​Backend Pagination Logic

​Frontend Pagination Component

​Catalog Grid Display

​Responsive Grid Layout

​Content Cards

​Type-Based Color Coding

Movies

Series

Anime

​Loading States

​Data Extraction

​Extraction Process

​Image Optimization

​Performance Optimizations

​Lazy Loading Strategy

​Query Caching

​User Experience Features

​API Integration

​Best Practices

​Related Features

Search

Video Player

Build docs developers (and LLMs) love

Overview

Catalog Sections

Available Sections

Pagination System

Clean URL Routing

Backend Pagination Logic

Frontend Pagination Component

Catalog Grid Display

Responsive Grid Layout

Content Cards

Type-Based Color Coding

Loading States

Data Extraction

Extraction Process

Image Optimization

Performance Optimizations

Lazy Loading Strategy

Query Caching

User Experience Features

API Integration

Best Practices

Related Features