Overview
The Siigo Corprecam Scraper is a Node.js application that runs an Express server with Playwright automation. This guide covers how to start the application in both development and production environments.Prerequisites
Before running the application, ensure you have completed:- Installation - All dependencies and tools installed
- Configuration - Environment variables properly configured in
.env
Development Mode
Development mode uses Node.js’s built-in watch mode for automatic restarts on file changes.Start Development Server
Run the development script defined in package.json (package.json:6):Expected Console Output
When the server starts successfully, you should see:- Initialize an ngrok tunnel on port 3000 (server.ts:46-49)
- Register the tunnel URL with the remote API (server.ts:51)
The ngrok tunnel URL changes each time you restart the application unless you have a paid ngrok plan with reserved domains.
Production Mode
For production deployments, run the server directly without watch mode.Start Production Server
Using Process Managers
For production environments, use a process manager to ensure the application stays running and restarts on crashes.PM2 (Recommended)
Install PM2 globally:Using systemd (Linux)
Create a systemd service file/etc/systemd/system/corprecam-scraper.service:
Application Startup Process
When the application starts, it follows this sequence:1. Load Configuration (config.ts:1-16)
.env file.
2. Initialize Express Server (server.ts:7)
- Creates Express application
- Enables CORS for cross-origin requests (server.ts:17)
- Configures JSON body parsing (server.ts:19)
3. Register API Endpoints
The application registers a POST endpoint for scraping operations (server.ts:21-41):4. Start HTTP Server (server.ts:43-52)
- Listens on the configured PORT (default: 3000)
- Creates an ngrok tunnel pointing to port 3000
- Registers the public ngrok URL with the remote API
The ngrok tunnel allows the remote Corprecam system to send webhook requests to your local server.
API Endpoints
Once running, the application exposes the following endpoint:POST /scrapping
Purpose: Triggers the Playwright automation to create support documents in Siigo Request Body:- Fetches purchase data via
getCompras()(server.ts:24) - Retrieves purchase items via
getCompraItems()(server.ts:26) - Fetches material details via
getMateriales()(server.ts:30) - Gets microroute information via
getMicro()(server.ts:32) - Transforms data into document support format (server.ts:34)
- Executes Playwright automation via
run_playwright()(server.ts:36)
Testing the Application
Once the server is running, you can test it using curl or any HTTP client:Monitoring and Logs
Console Logs
The application outputs logs to stdout:- Server startup:
Server running on port 3000 - Document processing logs from
run_playwright()(main.ts:13-14)
Viewing Logs
Development:Troubleshooting
Server Won’t Start
Issue: Port already in useNgrok Connection Fails
Issue: Invalid authtoken or tunnel limit exceeded Solution:- Verify
NGROK_AUTHTOKENin.env - Check ngrok dashboard for active tunnels
- Free tier allows only 1 tunnel at a time
- Ensure you’re not running multiple instances
Playwright Automation Errors
Issue: Browser crashes or timeouts Solution:- Check Siigo credentials in
.env - Ensure Playwright browsers are installed:
npx playwright install - On Linux, install system dependencies:
npx playwright install-deps - Check if the Siigo website structure has changed
Database Connection Errors
Issue: Cannot connect to MySQL Solution:- Verify MySQL is running:
sudo systemctl status mysql - Test credentials:
mysql -h $DB_HOST -u $DB_USER -p - Check firewall rules if connecting to remote MySQL
- Verify
DB_PORTis correct (default: 3306)
Module Resolution Errors
Issue: Cannot find module or import errors Solution:Performance Considerations
Concurrent Requests
The/scrapping endpoint executes Playwright automation sequentially. For high-volume scenarios:
- Consider implementing a job queue (Bull, BullMQ)
- Add request rate limiting
- Monitor memory usage (Playwright can be memory-intensive)
Resource Limits
Memory: Playwright browsers can consume 200-500MB per instance CPU: Browser automation is CPU-intensive Recommended Production Specs:- 2+ CPU cores
- 4GB+ RAM
- SSD storage for faster browser operations
Stopping the Application
Development:Next Steps
Now that you can run the application:- Review the source code to understand the scraping logic
- Customize the automation scripts for your specific needs
- Set up monitoring and alerting for production deployments
- Configure backups for your MySQL database
- Implement logging to a file or external service