Skip to main content
This guide walks you through setting up Web Scrapping Hub for local development. The application consists of a Flask backend (Python) and a React frontend (Vite).

Prerequisites

Make sure you have the following installed before proceeding:
  • Python 3.8 or higher - Required for the Flask backend
  • Node.js 18+ and npm - Required for the React frontend
  • Git - For cloning the repository
We recommend using a Python virtual environment to avoid dependency conflicts.

Installation Steps

1

Clone the repository

First, clone the Web Scrapping Hub repository to your local machine:
git clone <repository-url>
cd Web-Scrapping
2

Set up the backend

Navigate to the backend directory and create a Python virtual environment:
cd backend
python3 -m venv .venv
Activate the virtual environment:
source .venv/bin/activate
Install Python dependencies:
pip install -r requirements.txt
The backend uses the following key dependencies:
  • Flask - Web framework
  • Flask-CORS - Cross-Origin Resource Sharing support
  • BeautifulSoup4 - HTML parsing for web scraping
  • cloudscraper - Cloudflare bypass for stable data extraction
  • adblockparser - Ad blocking functionality
  • pytest - Testing framework
3

Set up the frontend

Navigate to the frontend project directory:
cd ../frontend/project
Install Node.js dependencies:
npm install
The frontend uses:
  • React 18.3 - UI framework
  • Vite 5.4 - Fast build tool and dev server
  • TailwindCSS - Utility-first CSS framework
  • React Router DOM - Client-side routing
  • TanStack Query - Data fetching and caching
  • Lucide React - Icon library
4

Start the development servers

You’ll need to run both the backend and frontend servers simultaneously.Terminal 1 - Backend:
# From project root
python -m backend.app
The backend will start on port 1234.Terminal 2 - Frontend:
# From frontend/project directory
npm run dev
The frontend development server will start on port 5173 (or the next available port).
5

Access the application

Open your browser and navigate to:
http://localhost:5173
The Vite dev server will proxy API requests to the Flask backend running on port 1234.

Project Structure

Web-Scrapping/
├── backend/
│   ├── app.py              # Main Flask application
│   ├── config.py           # Configuration settings
│   ├── main.py             # Entry point
│   ├── requirements.txt    # Python dependencies
│   ├── utils/              # Utility functions
│   └── extractors/         # Modular web extractors
├── frontend/
│   └── project/
│       ├── src/
│       │   ├── hooks/      # React hooks organized by domain
│       │   │   ├── api/    # API hooks (catalog, search)
│       │   │   ├── ui/     # UI hooks (modal, pagination)
│       │   │   └── utils/  # Utility hooks (debounce, localStorage)
│       │   ├── components/ # React components
│       │   └── pages/      # Page components
│       ├── package.json    # Node.js dependencies
│       └── dist/           # Production build output
└── tests/                  # Unit and integration tests

Development Workflow

Running Tests

Backend tests:
cd backend
pytest
Frontend linting:
cd frontend/project
npm run lint

Building for Production

Build the frontend:
cd frontend/project
npm run build
The optimized build will be output to frontend/project/dist/.

Preview Production Build

cd frontend/project
npm run preview

Environment Configuration

The application configuration is managed in backend/config.py:
  • APP_VERSION: Current application version (1.4.8)
  • BASE_URL: Target scraping URL (sololatino.net)
  • TARGET_URLS: List of content sections and streaming services
The frontend automatically detects the server IP. No additional environment variables are required for basic local development.

Troubleshooting

If port 1234 is already occupied, you can modify the port in backend/app.py or stop the conflicting process:
# Find process using port 1234
lsof -i :1234

# Kill the process (replace PID with actual process ID)
kill -9 <PID>
Make sure Flask-CORS is properly installed and configured. The backend should automatically allow requests from the Vite dev server.
Ensure your virtual environment is activated and all dependencies are installed:
source .venv/bin/activate
pip install -r requirements.txt
Try clearing the npm cache and reinstalling:
npm cache clean --force
rm -rf node_modules package-lock.json
npm install

Next Steps

Docker Deployment

Deploy using Docker and docker-compose

CasaOS Deployment

Deploy to CasaOS for home server setup

Build docs developers (and LLMs) love