Overview
The Web Scrapping Hub backend is built with Flask, providing a RESTful API for scraping content from various sources. The application uses cloudscraper to bypass anti-bot protections and BeautifulSoup for HTML parsing.Application Structure
The Flask application is defined inbackend/app.py and follows a modular structure:
Configuration
The application configuration is centralized inconfig.py:
Flask App Initialization
The Flask app is initialized with CORS support and proper caching configuration:Key Configuration Points
- CORS: Enabled to allow cross-origin requests from the frontend
- Caching Disabled: Prevents stale data by disabling ETags and file caching
- Debug Mode: Enabled in development for detailed error messages
API Endpoints
Version Management
GET/api/version
Checks for application updates by comparing local version with remote version:
Content Discovery
GET/api/secciones
Returns available content sections:
/api/listado?seccion=<section>&pagina=<page>&busqueda=<query>
Fetches content listings with optional search:
Content Details
GET/api/pelicula/<slug>
Retrieves movie details and player information:
/api/serie/<slug>
Retrieves series/anime episodes organized by season:
/api/iframe_player?url=<url>
Extracts iframe player from a specific URL:
Deep Search
GET/api/deep-search?query=<query>
Performs a deep search across the site:
Frontend Integration
The Flask app serves the frontend build in production:Utility Functions
Text Normalization
Version Comparison
Running the Application
Development Server
http://0.0.0.0:1234 by default.
Production Deployment
For production, use a WSGI server like Gunicorn:Error Handling
The API uses standard HTTP status codes:200 OK: Successful request400 Bad Request: Missing or invalid parameters404 Not Found: Resource not found500 Internal Server Error: Server-side error503 Service Unavailable: External service unavailable
Next Steps
Extractors
Learn how to create custom content extractors
Utilities
Explore utility modules for HTTP requests and parsing