What is Siigo Corprecam Scraper?
Siigo Corprecam Scraper is an automated web scraping solution that integrates purchase order data from the Corprecam administrative system with Siigo accounting software. It uses Playwright to automate the creation of support documents (Documento Soporte) in Siigo Nube, eliminating manual data entry for purchase transactions.Why It Exists
Manually entering purchase order data into Siigo is time-consuming and error-prone. For organizations like Corprecam and Reciclemos that process multiple purchase orders daily, this automation:- Saves Time: Eliminates manual data entry for each purchase order line item
- Reduces Errors: Automated field population ensures accurate product codes, quantities, and prices
- Ensures Consistency: Applies standardized warehouse and account settings across all transactions
- Improves Traceability: Direct integration between purchase orders and accounting documents
How It Works
The scraper follows a multi-step workflow:Fetch Related Data
The system retrieves:
- Purchase order details (com_codigo, comp_asociado, com_micro_ruta)
- Purchase items with quantities and prices
- Material codes and descriptions
- Micro-route information
Transform Data
Purchase items are categorized by company:
- Corprecam (emp_id_fk = 1): Uses NIT 900142913
- Reciclemos (emp_id_fk = 2): Uses NIT 901328575
Key Features
Dual Company Support
The scraper handles purchase orders for both Corprecam and Reciclemos, applying company-specific configurations:Retry Logic
All Playwright operations use retry-until-success patterns to handle Siigo’s dynamic Angular-based UI:Ngrok Integration
The server automatically exposes itself via Ngrok and registers the public URL with the Corprecam system:Architecture
API Layer (server.ts)
Express server that exposes the /scrapping endpoint and handles Ngrok tunneling.
Business Logic (main.ts)
Orchestrates the scraping workflow, determining which company configurations to apply.
Data Transformation (utils/transformDs.ts)
Converts database records into the DocumentoSoporte format, splitting products by company.
Playwright Automation (utils/functions.ts)
Contains all browser automation logic: login, product selection, form filling, and submission.
External API Integration (api/php.ts)
Fetches purchase order data from the Corprecam administrative backend.
Data Flow
Technology Stack
- Runtime: Node.js with TypeScript
- Web Automation: Playwright (Firefox)
- API Framework: Express 5.x
- Database: MySQL 2 (for external API integration)
- Tunneling: Ngrok
- Environment Management: dotenv
Next Steps
Quick Start
Get the scraper running in under 5 minutes
Configuration
Learn about environment variables and settings
API Reference
Explore the /scrapping endpoint
Troubleshooting
Common issues and solutions