Running the Application - Siigo Corprecam Scraper

Overview

The Siigo Corprecam Scraper is a Node.js application that runs an Express server with Playwright automation. This guide covers how to start the application in both development and production environments.

Prerequisites

Before running the application, ensure you have completed:

Installation - All dependencies and tools installed
Configuration - Environment variables properly configured in .env

Development Mode

Development mode uses Node.js’s built-in watch mode for automatic restarts on file changes.

Start Development Server

Run the development script defined in package.json (package.json:6):

npm run dev

This executes the following command:

node --watch server.ts

The --watch flag automatically restarts the server whenever TypeScript files are modified, providing a smooth development experience.

Expected Console Output

When the server starts successfully, you should see:

Server running on port 3000

The application will also:

Initialize an ngrok tunnel on port 3000 (server.ts:46-49)
Register the tunnel URL with the remote API (server.ts:51)

The ngrok tunnel URL changes each time you restart the application unless you have a paid ngrok plan with reserved domains.

Production Mode

For production deployments, run the server directly without watch mode.

Start Production Server

node server.ts

Ensure your production .env file contains production-ready credentials and database connection details.

Using Process Managers

For production environments, use a process manager to ensure the application stays running and restarts on crashes.

PM2 (Recommended)

Install PM2 globally:

npm install -g pm2

Start the application:

pm2 start server.ts --name corprecam-scraper --interpreter node

Useful PM2 commands:

# View application status
pm2 status

# View logs
pm2 logs corprecam-scraper

# Restart application
pm2 restart corprecam-scraper

# Stop application
pm2 stop corprecam-scraper

# Remove from PM2
pm2 delete corprecam-scraper

# Save PM2 configuration
pm2 save

# Setup PM2 to start on system boot
pm2 startup

Use pm2 monit for a real-time monitoring dashboard.

Using systemd (Linux)

Create a systemd service file /etc/systemd/system/corprecam-scraper.service:

[Unit]
Description=Siigo Corprecam Scraper
After=network.target mysql.service

[Service]
Type=simple
User=your-username
WorkingDirectory=/path/to/playwright-corprecam
Environment="NODE_ENV=production"
ExecStart=/usr/bin/node server.ts
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable corprecam-scraper
sudo systemctl start corprecam-scraper

# Check status
sudo systemctl status corprecam-scraper

# View logs
sudo journalctl -u corprecam-scraper -f

Application Startup Process

When the application starts, it follows this sequence:

1. Load Configuration (config.ts:1-16)

import dotenv from "dotenv";
dotenv.config();

Environment variables are loaded from .env file.

2. Initialize Express Server (server.ts:7)

const app = express();
app.use(cors());
app.use(express.json());

Creates Express application
Enables CORS for cross-origin requests (server.ts:17)
Configures JSON body parsing (server.ts:19)

3. Register API Endpoints

The application registers a POST endpoint for scraping operations (server.ts:21-41):

app.post("/scrapping", async (req, res) => {
  // Scraping logic
});

4. Start HTTP Server (server.ts:43-52)

app.listen(config.PORT, async () => {
  console.log(`Server running on port ${config.PORT}`);
  
  const listener = await ngrok.forward({
    addr: 3000,
    authtoken: config.NGROK_AUTHTOKEN,
  });
  
  await setNgrok(listener.url());
});

The server:

Listens on the configured PORT (default: 3000)
Creates an ngrok tunnel pointing to port 3000
Registers the public ngrok URL with the remote API

The ngrok tunnel allows the remote Corprecam system to send webhook requests to your local server.

API Endpoints

Once running, the application exposes the following endpoint:

POST /scrapping

Purpose: Triggers the Playwright automation to create support documents in Siigo Request Body:

{
  "compra": "COM12345"
}

Processing Flow (server.ts:21-41):

Fetches purchase data via getCompras() (server.ts:24)
Retrieves purchase items via getCompraItems() (server.ts:26)
Fetches material details via getMateriales() (server.ts:30)
Gets microroute information via getMicro() (server.ts:32)
Transforms data into document support format (server.ts:34)
Executes Playwright automation via run_playwright() (server.ts:36)

Response:

{
  "message": "ok"
}

This endpoint triggers browser automation which can take several seconds to complete. Consider implementing timeout handling in production.

Testing the Application

Once the server is running, you can test it using curl or any HTTP client:

curl -X POST http://localhost:3000/scrapping \
  -H "Content-Type: application/json" \
  -d '{"compra": "COM12345"}'

You should receive:

{"message":"ok"}

Use tools like Postman, Insomnia, or HTTPie for interactive API testing during development.

Monitoring and Logs

Console Logs

The application outputs logs to stdout:

Server startup: Server running on port 3000
Document processing logs from run_playwright() (main.ts:13-14)

Viewing Logs

Development:

# Logs appear in your terminal
npm run dev

PM2:

pm2 logs corprecam-scraper
pm2 logs corprecam-scraper --lines 100

systemd:

sudo journalctl -u corprecam-scraper -f
sudo journalctl -u corprecam-scraper --since "1 hour ago"

Troubleshooting

Server Won’t Start

Issue: Port already in use

Error: listen EADDRINUSE: address already in use :::3000

Solution:

# Find process using port 3000
lsof -i :3000
# Kill the process
kill -9 <PID>
# Or change PORT in .env
PORT=3001

Ngrok Connection Fails

Issue: Invalid authtoken or tunnel limit exceeded Solution:

Verify NGROK_AUTHTOKEN in .env
Check ngrok dashboard for active tunnels
Free tier allows only 1 tunnel at a time
Ensure you’re not running multiple instances

Playwright Automation Errors

Issue: Browser crashes or timeouts Solution:

Check Siigo credentials in .env
Ensure Playwright browsers are installed: npx playwright install
On Linux, install system dependencies: npx playwright install-deps
Check if the Siigo website structure has changed

Database Connection Errors

Issue: Cannot connect to MySQL Solution:

Verify MySQL is running: sudo systemctl status mysql
Test credentials: mysql -h $DB_HOST -u $DB_USER -p
Check firewall rules if connecting to remote MySQL
Verify DB_PORT is correct (default: 3306)

Module Resolution Errors

Issue: Cannot find module or import errors Solution:

# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install

# Ensure TypeScript files use .ts extensions in imports
# The project uses ESM modules (package.json:9)

Performance Considerations

Concurrent Requests

The /scrapping endpoint executes Playwright automation sequentially. For high-volume scenarios:

Consider implementing a job queue (Bull, BullMQ)
Add request rate limiting
Monitor memory usage (Playwright can be memory-intensive)

Resource Limits

Memory: Playwright browsers can consume 200-500MB per instance CPU: Browser automation is CPU-intensive Recommended Production Specs:

2+ CPU cores
4GB+ RAM
SSD storage for faster browser operations

Stopping the Application

Development:

# Press Ctrl+C in the terminal

PM2:

pm2 stop corprecam-scraper

systemd:

sudo systemctl stop corprecam-scraper

Next Steps

Now that you can run the application:

Review the source code to understand the scraping logic
Customize the automation scripts for your specific needs
Set up monitoring and alerting for production deployments
Configure backups for your MySQL database
Implement logging to a file or external service

For production deployments, consider implementing health check endpoints and integrating with monitoring tools like Datadog, New Relic, or Prometheus.

Overview

Getting Started

Architecture

API Reference

Automation

Utilities

Troubleshooting

​Overview

​Prerequisites

​Development Mode

​Start Development Server

​Expected Console Output

​Production Mode

​Start Production Server

​Using Process Managers

​PM2 (Recommended)

​Using systemd (Linux)

​Application Startup Process

​1. Load Configuration (config.ts:1-16)

​2. Initialize Express Server (server.ts:7)

​3. Register API Endpoints

​4. Start HTTP Server (server.ts:43-52)

​API Endpoints

​POST /scrapping

​Testing the Application

​Monitoring and Logs

​Console Logs

​Viewing Logs

​Troubleshooting

​Server Won’t Start

​Ngrok Connection Fails

​Playwright Automation Errors

​Database Connection Errors

​Module Resolution Errors

​Performance Considerations

​Concurrent Requests

​Resource Limits

​Stopping the Application

​Next Steps

Build docs developers (and LLMs) love

Overview

Prerequisites

Development Mode

Start Development Server

Expected Console Output

Production Mode

Start Production Server

Using Process Managers

PM2 (Recommended)

Using systemd (Linux)

Application Startup Process

1. Load Configuration (config.ts:1-16)

2. Initialize Express Server (server.ts:7)

3. Register API Endpoints

4. Start HTTP Server (server.ts:43-52)

API Endpoints

POST /scrapping

Testing the Application

Monitoring and Logs

Console Logs

Viewing Logs

Troubleshooting

Server Won’t Start

Ngrok Connection Fails

Playwright Automation Errors

Database Connection Errors

Module Resolution Errors

Performance Considerations

Concurrent Requests

Resource Limits

Stopping the Application

Next Steps