Overview
The CONFOR geospatial worker is a background process that handles:- Shapefile imports: Processing uploaded ZIP files containing Level 4 geometry
- Surface recalculations: Updating surface areas for Level 2 and Level 3 units
- Geometry variations: Tracking changes in polygon geometries over time
Architecture
The worker uses a polling architecture with configurable intervals and batch sizes. Multiple worker instances can run simultaneously for load distribution.
Quick Start
Development Mode
Production Deployment
Using PM2 (Recommended)
PM2 provides process management, automatic restarts, and monitoring.1. Install PM2
2. Configure Ecosystem File
The project includesecosystem.config.cjs with optimized settings:
ecosystem.config.cjs
Batch sizes in production (500/200/300) are significantly higher than development defaults (5/10/15) for optimal throughput.
3. Build and Start
4. Persist Configuration
Using Docker
Create a dedicated worker container:Using Systemd (Linux)
Create a system service for the worker:/etc/systemd/system/confor-worker.service
Configuration Reference
Environment Variables
| Variable | Type | Default | Production | Description |
|---|---|---|---|---|
GEO_WORKER_INTERVAL_MS | number | 4000 | 5000 | Milliseconds between polling cycles |
GEO_IMPORT_BATCH_SIZE | number | 5 | 500 | Max import jobs processed per cycle |
GEO_RECALC_BATCH_SIZE | number | 10 | 200 | Max recalc jobs processed per cycle |
GEO_VARIATION_BATCH_SIZE | number | 15 | 300 | Max variation jobs processed per cycle |
GEO_WORKER_RUN_ONCE | boolean | false | false | Exit after one cycle (testing) |
GEO_WORKER_SECRET | string | - | Required | Secret for worker API endpoints |
DATABASE_URL | string | Required | Required | PostgreSQL connection string |
NODE_ENV | string | development | production | Runtime environment |
Batch Size Tuning
Choose batch sizes based on your server capacity:- Light Load
- Medium Load
- Heavy Load
Suitable for: Small organizations, under 100 imports/day
Worker Implementation
The worker is implemented insrc/workers/geo-worker-scheduler.ts:
Job Processing Flow
Process Job
- Parse shapefile ZIP
- Validate geometry with PostGIS
- Transform to EPSG:4326
- Link to Level 2/3 hierarchy
- Calculate surface areas
Monitoring and Observability
Log Format
The worker outputs structured JSON logs:Metrics to Monitor
Job Throughput
Jobs processed per minute/hour
Queue Depth
Number of pending jobs
Error Rate
Percentage of failed jobs
Processing Time
Average time per job type
Database Queries
Scaling Strategies
Horizontal Scaling
Run multiple worker instances:Vertical Scaling
Increase batch sizes on powerful servers:Database Optimization
Add indexes for worker queries:Troubleshooting
Worker not processing jobs
Worker not processing jobs
Symptoms: Jobs stuck in PENDING statusChecklist:
- Verify worker is running:
pm2 statusorps aux | grep worker - Check worker logs:
pm2 logs confor-geo-worker --lines 50 - Verify database connection: Test
DATABASE_URLwithpsql - Check for exceptions in logs
- Verify batch sizes are not 0
High error rate on imports
High error rate on imports
Symptoms: Many jobs with
FAILED statusCommon causes:- Invalid shapefile format (missing .prj, .shx, etc.)
- Unsupported coordinate system
- Invalid geometries (self-intersections)
- Missing required attributes (nivel2_id, nivel3_id, nivel4_id)
- Hierarchy references don’t exist in database
Worker consuming too much memory
Worker consuming too much memory
Symptoms: Out of memory errors, worker crashesSolutions:
- Reduce batch sizes:
- Increase
restart_delayin PM2 config - Add memory limit to PM2:
- Use horizontal scaling instead of large batches
Jobs processing too slowly
Jobs processing too slowly
Symptoms: Queue depth growing, long processing timesSolutions:
- Scale horizontally: Add more worker instances
- Increase batch sizes (if CPU/memory allows)
- Reduce polling interval:
- Optimize database:
- Add database indexes (see Scaling > Database Optimization)
Worker stops after some time
Worker stops after some time
Symptoms: Worker runs for hours then stopsPossible causes:Check logs for patterns before crashes:
- Database connection timeout
- Unhandled promise rejection
- Memory leak causing restart
Best Practices
Use PM2 in Production
PM2 provides automatic restarts, log management, and monitoring.
Monitor Queue Depth
Alert when pending jobs exceed threshold (e.g., >1000).
Set Conservative Defaults
Start with low batch sizes and increase based on performance.
Separate Worker Server
Run workers on dedicated infrastructure in production.
Enable Graceful Shutdown
Handle SIGTERM/SIGINT to finish current jobs before exiting.
Implement Dead Letter Queue
Move repeatedly failing jobs to a separate queue for investigation.
Next Steps
Environment Variables
Complete environment variable reference
Database Setup
PostgreSQL and Prisma configuration
Shapefile Import
Learn about shapefile requirements
API Reference
Geospatial job API endpoints