Skip to main content
Deploy Web Scrapping Hub using Docker for a production-ready setup. This guide covers both single-container deployment and docker-compose orchestration.

Prerequisites

Ensure you have the following installed:
  • Docker (version 20.10 or higher)
  • Docker Compose (version 2.0 or higher, included with Docker Desktop)
  • Git (for cloning the repository)
The Docker image supports multiple architectures: AMD64, ARM64, and ARMv7 (Raspberry Pi compatible).

Quick Start with Docker

1

Clone the repository

git clone <repository-url>
cd Web-Scrapping
2

Build the Docker image

Navigate to the docker directory and build the image:
cd docker
docker build -t web-scrapping-hub .
The build process:
  1. Stage 1: Builds the React frontend using Node.js 20 Alpine
  2. Stage 2: Sets up Python 3.11 backend and serves the built frontend
3

Run the container

docker run -d \
  --name web-scrapping-hub \
  -p 1234:1234 \
  --restart unless-stopped \
  web-scrapping-hub
The application will be available at http://localhost:1234

Docker Compose Deployment

For production deployments with proper configuration, use docker-compose.

Using the Provided docker-compose.yml

1

Review the configuration

The included docker/docker-compose.yml provides a production-ready setup:
docker-compose.yml
services:
  anxerstudios-streaming:
    image: zlosttk/anxerstudios-streaming:latest
    container_name: anxerstudios-streaming
    hostname: anxerstudios-streaming
    
    # Resource limits
    cpu_shares: 90
    deploy:
      resources:
        limits:
          memory: 3369M
    
    # Environment variables
    environment:
      - PGID=1000
      - PUID=1000
      - TZ=America/Bogota
      - FLASK_ENV=production
      - FLASK_DEBUG=false
    
    # Port mapping
    ports:
      - "1234:1234"
    
    # Persistent storage
    volumes:
      - /DATA/AppData/anxerstudios-streaming/app/config:/app/config
      - /DATA/AppData/anxerstudios-streaming/logs:/app/logs
    
    restart: unless-stopped
    
    # Health check
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:1234/api/secciones || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
    networks:
      - anxerstudios-streaming

networks:
  anxerstudios-streaming:
    name: anxerstudios-streaming
    driver: bridge
2

Customize environment variables

Edit the environment variables according to your needs:
  • PGID/PUID: User/Group ID for file permissions (default: 1000)
  • TZ: Timezone (e.g., America/New_York, Europe/London)
  • FLASK_ENV: Set to production for deployment
  • FLASK_DEBUG: Keep as false in production
3

Adjust volume paths

Update the volume paths to match your system:
volumes:
  - /path/to/your/config:/app/config
  - /path/to/your/logs:/app/logs
Create these directories before starting the container to avoid permission issues.
4

Start the services

docker-compose up -d
This will:
  • Pull the image (or build if using local Dockerfile)
  • Create necessary networks
  • Start the container in detached mode
  • Enable automatic restart on failure

Dockerfile Architecture

The multi-stage Dockerfile optimizes the build process:

Stage 1: Frontend Build

FROM node:20-alpine AS frontend-build

WORKDIR /app/frontend/project
COPY ../frontend/project/ ./
RUN npm install && npm run build
  • Uses Node.js 20 Alpine for minimal image size
  • Installs dependencies and builds optimized production bundle
  • Output saved to /app/frontend/project/dist

Stage 2: Backend + Static Files

FROM python:3.11-alpine

# System dependencies for scraping and Python compilation
RUN apk update && \
    apk add --no-cache \
      build-base \
      glib \
      libxext \
      libsm \
      libxrender \
      jpeg-dev \
      zlib-dev \
      musl-dev \
      libffi-dev \
      openssl-dev

WORKDIR /app

# Copy backend code
COPY ../backend/ ./backend/

# Copy frontend build from previous stage
COPY --from=frontend-build /app/frontend/project/dist ./frontend/dist

# Install Python dependencies
RUN pip install --no-cache-dir -r ./backend/requirements.txt

EXPOSE 1234
CMD ["python", "-m", "backend.app"]
System dependencies are required for:
  • cloudscraper: Cloudflare bypass functionality
  • beautifulsoup4: HTML parsing
  • Compilation of native Python packages on Alpine Linux

Managing the Container

View Logs

docker logs -f web-scrapping-hub

Check Container Status

docker ps -a | grep web-scrapping-hub

Stop the Container

docker stop web-scrapping-hub

Restart the Container

docker restart web-scrapping-hub

Remove the Container

docker stop web-scrapping-hub
docker rm web-scrapping-hub

Updating the Application

1

Pull latest changes

git pull origin main
2

Rebuild the image

cd docker
docker build -t web-scrapping-hub .
3

Stop and remove old container

docker stop web-scrapping-hub
docker rm web-scrapping-hub
4

Start new container

docker run -d \
  --name web-scrapping-hub \
  -p 1234:1234 \
  --restart unless-stopped \
  web-scrapping-hub

Using Docker Compose

docker-compose down
docker-compose pull  # If using remote image
docker-compose up -d --build

Network Access

Local Network Access

The application is accessible from any device on your network:
http://<SERVER_IP>:1234

Port Configuration

Ensure port 1234 is open in your firewall:
# UFW (Ubuntu/Debian)
sudo ufw allow 1234/tcp

# Firewalld (CentOS/RHEL)
sudo firewall-cmd --permanent --add-port=1234/tcp
sudo firewall-cmd --reload

Health Monitoring

The docker-compose configuration includes a health check:
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:1234/api/secciones || exit 1"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s
This checks the /api/secciones endpoint every 30 seconds to ensure the application is responsive.

Check Health Status

docker inspect --format='{{.State.Health.Status}}' web-scrapping-hub

Troubleshooting

The Dockerfile is optimized for ARM architectures. If you encounter issues:
  1. Ensure you’re using the correct platform:
docker build --platform linux/arm64 -t web-scrapping-hub .
  1. Check available disk space (builds require ~2GB)
Check the logs for errors:
docker logs web-scrapping-hub
Common issues:
  • Missing environment variables
  • Port 1234 already in use
  • Insufficient memory
  1. Verify the container is running: docker ps
  2. Check firewall settings (port 1234)
  3. Ensure the container is bound to 0.0.0.0, not 127.0.0.1
  4. Test locally first: curl http://localhost:1234
Adjust the memory limit in docker-compose.yml:
deploy:
  resources:
    limits:
      memory: 2048M  # Reduce from 3369M

Next Steps

CasaOS Deployment

Deploy to CasaOS for easier management

Architecture Overview

Learn about the system architecture

Build docs developers (and LLMs) love