Skip to main content

Overview

Web Scraping Hub features a dual-mode search system that provides both quick search for instant results and deep search for comprehensive content discovery. The search architecture is built on React Query hooks with optimized caching and debouncing.

Search Modes

Quick Search Implementation

Backend API Endpoint

The quick search uses a specialized JSON API that returns structured data:
app.py:81-110
busqueda = request.args.get('busqueda')
if busqueda:
    query = busqueda.strip()
    url = f"https://sololatino.net/wp-json/dooplay/search/?keyword={query}&nonce=84428a202e"
    
    data = fetch_json(url)
    items = []
    if isinstance(data, dict):
        for k, v in data.items():
            if isinstance(v, dict):
                tipo_val = v.get("type", "").lower()
                tipo = "movie" if tipo_val == "pelicula" else "series" if tipo_val == "serie" else "anime"
                items.append({
                    "id": k,
                    "slug": v.get("url", "").rstrip('/').split('/')[-1],
                    "title": v.get("title", ""),
                    "image": v.get("img", ""),
                    "year": v.get("extra", {}).get("date", ""),
                    "language": "Latino",
                    "type": tipo
                })
    return jsonify({"resultados": items, "seccion": "Busqueda", "pagina": 1})
The quick search endpoint uses a nonce-based API that provides pre-indexed results for faster response times.

Frontend Search Hook

React Query hook with intelligent caching:
search.ts:68-75
export const useCatalogSearch = (query: string) => {
  return useQuery({
    queryKey: ['catalog-search', query],
    queryFn: () => searchApi.searchCatalog(query),
    enabled: !!query,
    staleTime: 2 * 60 * 1000, // 2 minutes
  });
};
Key Features:
  • Query key based on search term
  • Disabled when query is empty
  • 2-minute cache duration
  • Automatic background refetching

Deep Search Implementation

Backend Endpoint

The deep search performs full HTML scraping for comprehensive results:
app.py:143-156
@app.route('/api/deep-search', methods=['GET'])
def api_deep_search():
    query = request.args.get('query', '').strip()
    if not query:
        return jsonify({'error': 'Falta el parámetro de búsqueda'}), 400
    
    url = f"https://sololatino.net/?s={quote_plus(query)}"
    html = fetch_html(url)
    if not html:
        return jsonify({'error': 'No se pudo obtener resultados'}), 503
    
    resultados = extraer_listado(html)
    return jsonify(resultados)
Deep search uses the same extraer_listado function as catalog browsing, ensuring consistent data structure.

Deep Search Hook

search.ts:77-84
export const useDeepSearchCatalog = (query: string) => {
  return useQuery({
    queryKey: ['deep-search-catalog', query],
    queryFn: () => searchApi.deepSearchCatalog(query),
    enabled: !!query,
    staleTime: 2 * 60 * 1000,
  });
};

Data Normalization

Search Result Mapping

Both search modes normalize data to a consistent format:
search.ts:36-63
async deepSearchCatalog(query: string): Promise<CatalogItem[]> {
  const data = await apiClient.get<any>('/deep-search', { query });
  
  return (data as any[]).map((item) => ({
    id: item.slug || item.url,
    slug: item.slug || item.url,
    title: item.titulo || item.title || item.nombre || '',
    alt: item.alt || '',
    image: item.imagen || item.image || '',
    year: item.año || item.year || '',
    genres: Array.isArray(item.generos) 
      ? item.generos
      : typeof item.generos === 'string' && item.generos.trim() !== ''
        ? item.generos.split(',').map((g: string) => g.trim())
        : [],
    language: item.idioma || item.language || 'Latino',
    url: item.url,
    type: item.tipo === 'pelicula' || item.type === 'movie'
      ? 'movie'
      : item.tipo === 'serie' || item.type === 'series'
        ? 'series'
        : 'anime'
  }));
}
  • Multiple field fallbacks - Handles varying source data structures
  • Genre parsing - Converts strings to arrays automatically
  • Type classification - Maps Spanish/English type names to standard values
  • Default values - Provides fallbacks for missing data

Search Performance

Debouncing Strategy

Search input is debounced to reduce API calls:
1

User Types

User enters search query in input field
2

Debounce Delay

System waits 300-500ms after last keystroke
3

Query Execution

Search query fires only after typing stops
4

Cache Check

React Query checks if results are cached
5

Display Results

Results shown from cache or fresh fetch
FeatureQuick SearchDeep Search
Speed~300ms~2s
Data SourceJSON APIHTML Scraping
CoverageRecent itemsFull catalog
Cloudflare BypassNot neededRequired
Results FormatStructured JSONParsed HTML
Cache Duration2 minutes2 minutes
Error RateLowMedium

URL Encoding

Deep search properly encodes query parameters:
app.py:148
url = f"https://sololatino.net/?s={quote_plus(query)}"
Always use quote_plus for URL encoding search queries to handle special characters and spaces correctly.

Error Handling

Backend Error Responses

if not query:
    return jsonify({'error': 'Falta el parámetro de búsqueda'}), 400

Frontend Error States

The search hooks automatically handle error states through React Query:
const { data, isLoading, error } = useCatalogSearch(query);

if (error) {
  return <div>Error: {error.message}</div>;
}

Search Features

Type Filtering

Results automatically categorized by movie/series/anime

Fuzzy Matching

Deep search finds partial matches and similar titles

Real-time Updates

Quick search provides live results as you type

Cache Optimization

Previous searches load instantly from cache

Integration with Catalog

Search results display in the same catalog grid:
const { data: searchResults } = useCatalogSearch(searchQuery);

return (
  <CatalogGrid 
    items={searchResults || []} 
    loading={isLoading} 
  />
);
Reusing the CatalogGrid component ensures consistent UI between browsing and searching.

Advanced Search Patterns

Search by Title

Avengers

Search by Year

2024 action

Search by Genre

comedy series

Search with Special Characters

Spider-Man: No Way Home
Deep search handles special characters better than quick search due to full HTML parsing.

Cloudflare Bypass

Both search modes benefit from the Cloudflare bypass system:
http_client.py:1-18
import cloudscraper

_scraper = cloudscraper.create_scraper()

def fetch_html(url):
    try:
        response = _scraper.get(url, timeout=30)
        if response.status_code == 200:
            return response.text
    except Exception as e:
        print(f"[ERROR] fetch_html: {e}")
    return None
The cloudscraper library automatically handles Cloudflare challenges, cookies, and JavaScript challenges.

Search Result Quality

Data Completeness

Search results include:
  • ✅ Title and alternative titles
  • ✅ Poster images with fallbacks
  • ✅ Release year
  • ✅ Genre classification
  • ✅ Content type (movie/series/anime)
  • ✅ Direct play URLs
  • ✅ Language information

Missing Data Handling

search.ts:42-45
title: item.titulo || item.title || item.nombre || '',
alt: item.alt || '',
image: item.imagen || item.image || '',
year: item.año || item.year || '',
Multiple fallbacks ensure data is never undefined.

Best Practices

  • Implement 300-500ms debounce
  • Show loading indicator during search
  • Display “No results” message gracefully
  • Provide search mode toggle if both available

API Reference

Quick Search Endpoint

GET /api/listado?busqueda={query}
Parameters:
  • busqueda (string): Search query
Response:
{
  "resultados": [
    {
      "id": "12345",
      "slug": "movie-title",
      "title": "Movie Title",
      "image": "https://...",
      "year": "2024",
      "language": "Latino",
      "type": "movie"
    }
  ],
  "seccion": "Busqueda",
  "pagina": 1
}

Deep Search Endpoint

GET /api/deep-search?query={query}
Parameters:
  • query (string, required): Search query
Response:
[
  {
    "id": "123",
    "slug": "content-slug",
    "titulo": "Content Title",
    "imagen": "https://...",
    "year": "2024",
    "generos": "Action, Drama",
    "idioma": "Latino",
    "tipo": "pelicula"
  }
]

Catalog Browsing

Learn about section-based catalog navigation

Web Scraping

Understand the scraping technology behind search

Build docs developers (and LLMs) love