Skip to main content

Quickstart Guide

This guide will take you from zero to watching your first video in under 10 minutes. We’ll use Docker for the fastest setup, but local development options are also available.
Prerequisites: Docker installed on your system. Get Docker →

Quick Start with Docker

The fastest way to get Web Scrapping Hub running is with Docker:
1

Clone the Repository

Get the source code from GitHub:
git clone https://github.com/UnfairAdventage/Web-Scrapping.git
cd Web-Scrapping
2

Build the Docker Image

Build the multi-stage Docker image (this will take a few minutes):
cd docker
docker build -t peliculas-casaos .
The build process compiles the React frontend and packages it with the Flask backend in a single container.
3

Run the Container

Start the application container:
docker run -d --name peliculas -p 1234:1234 peliculas-casaos
The -d flag runs the container in detached mode, -p 1234:1234 maps port 1234 to your host.
4

Access the Application

Open your browser and navigate to:
http://localhost:1234
You should see the Web Scrapping Hub homepage with the catalog of movies!

Watch Your First Video

Now let’s browse content and start streaming:
1

Browse the Catalog

The homepage displays the latest movies. Use the navigation to explore different sections:
  • Movies (/peliculas) - Latino movies
  • Series (/series) - TV series
  • Anime (/animes) - Anime content
Use the pagination controls at the bottom to browse through pages: /page/1, /page/2, etc.
2

Search for Content

Use the search bar to find specific content:
  1. Click the search icon in the navigation bar
  2. Type your search query (e.g., “Spider-Man”)
  3. Press Enter or click the search button
The search uses the API endpoint /api/listado?busqueda=YOUR_QUERY to fetch results.
3

Select a Movie or Show

Click on any movie poster or title card to view details. The app will navigate to:
  • Movies: /movie/<slug>
  • Series: /series/<slug>
  • Anime: /anime/<slug>
You’ll see:
  • Full synopsis
  • Genre tags
  • Release date
  • High-quality poster
4

Start Watching

Click the Watch or Play button. This will:
  1. Extract the video player iframe from the source
  2. Open a modal player with the embedded video
  3. Display episode navigation (for series/anime)
For series, you can navigate between episodes and seasons using the controls in the player footer.

Understanding the Flow

Here’s what happens behind the scenes when you watch a video:
  1. User clicks on content → React Router navigates to detail page
  2. Detail page loads → TanStack Query fetches data from API
  3. User clicks Play → Modal component opens with player
  4. Player renders → Iframe displays video from extracted URL

Local Development Setup

Prefer to run the backend and frontend separately for development? Here’s how:
Run the Flask API server:
# Navigate to backend directory
cd backend

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the Flask server
python -m backend.app
The API will be available at http://localhost:1234
flask
flask-cors
beautifulsoup4
adblockparser
cloudscraper
pytest
Run the React development server:
# Navigate to frontend directory
cd frontend/project

# Install dependencies
npm install

# Start Vite dev server
npm run dev
The frontend will be available at http://localhost:5173 (or the port Vite assigns)
The frontend expects the backend API to be running on http://localhost:1234. CORS is enabled in the Flask app.
{
  "dependencies": {
    "react": "^18.3.1",
    "react-dom": "^18.3.1",
    "react-router-dom": "^7.6.3",
    "@tanstack/react-query": "^5.83.0",
    "lucide-react": "^0.344.0"
  }
}

Test the API Directly

You can test the backend API endpoints directly with curl or your browser:
curl http://localhost:1234/api/secciones
All endpoints return JSON. The application automatically checks for updates from the GitHub repository.

Common URL Patterns

Here are the URL patterns you’ll encounter in the application:
PatternExampleDescription
/page/:number/page/1, /page/2Paginated catalog browsing
/peliculas/peliculasMovies section
/series/seriesSeries section
/movie/:slug/movie/spider-manMovie detail page
/series/:slug/series/breaking-badSeries detail with episodes
/anime/:slug/anime/narutoAnime detail with episodes
/ver/:tipo/:slug/ver/pelicula/spider-manVideo player modal
The application uses React Router for client-side navigation, so page transitions are instant!

Verify Everything Works

Run through this checklist to ensure your installation is working correctly:
1

Homepage Loads

Visit http://localhost:1234 - You should see movie posters and a navigation bar
2

Pagination Works

Click “Next” or navigate to /page/2 - New content should load
3

Search Functions

Search for “Spider” - Results should appear
4

Detail Pages Load

Click any movie - You should see synopsis, poster, and genres
5

Video Player Opens

Click “Watch” or “Play” - Modal should open with embedded player
If the video player doesn’t load, check the browser console for errors. Some video sources may require specific settings or may be temporarily unavailable.

Troubleshooting

Check logs:
docker logs peliculas
Common issues:
  • Port 1234 already in use: Stop other services or change the port mapping
  • Build failed: Ensure Docker has enough memory (at least 2GB)
  • Network issues: Check your internet connection for npm/pip downloads
Possible causes:
  • External source is down or blocked
  • Cloudflare protection updated (check cloudscraper version)
  • Network firewall blocking requests
Check API response:
curl http://localhost:1234/api/listado?seccion=Películas&pagina=1
If you get an error or empty results, the scraping source may need updating.
For local development:
  • Ensure Flask is running on port 1234
  • Check CORS is enabled in app.py:
    from flask_cors import CORS
    CORS(app)
    
  • Verify the frontend is making requests to the correct URL
Debugging steps:
  1. Check browser console for errors
  2. Verify iframe extractor is working:
    curl "http://localhost:1234/api/iframe_player?url=<MOVIE_URL>"
    
  3. Some video sources may be geo-restricted or require additional headers
  4. Check if the external site’s HTML structure has changed

Next Steps

Congratulations! You now have Web Scrapping Hub running. Here’s what to explore next:

Architecture Overview

Understand the system architecture and component interactions

API Reference

Explore all available API endpoints

Configuration

Customize target URLs and scraping sources

CasaOS Deployment

Deploy to CasaOS for home server use
Development Mode: For active development, use the local setup with hot reload enabled for both frontend (Vite) and backend (Flask debug mode).

Getting Help

If you encounter issues:
The application version is 1.4.8 and can be checked via the /api/version endpoint.

Build docs developers (and LLMs) love