Skip to main content

Overview

The CONFOR geospatial worker is a background process that handles:
  • Shapefile imports: Processing uploaded ZIP files containing Level 4 geometry
  • Surface recalculations: Updating surface areas for Level 2 and Level 3 units
  • Geometry variations: Tracking changes in polygon geometries over time
The worker runs independently of the web application and can be scaled horizontally.

Architecture

The worker uses a polling architecture with configurable intervals and batch sizes. Multiple worker instances can run simultaneously for load distribution.

Quick Start

Development Mode

# Start the worker (runs until stopped)
pnpm worker:geo

# Or with npm
npm run worker:geo

# Worker output:
[2026-03-09T14:30:00.123Z] [geo-worker] started (interval=4000ms, importBatch=5, recalcBatch=10, variationBatch=15, runOnce=false)
[2026-03-09T14:30:05.456Z] [geo-worker] processed import_jobs=2 recalc_jobs=1 variation_jobs=0

Production Deployment

PM2 provides process management, automatic restarts, and monitoring.

1. Install PM2

npm install -g pm2

2. Configure Ecosystem File

The project includes ecosystem.config.cjs with optimized settings:
ecosystem.config.cjs
module.exports = {
  apps: [
    {
      name: "confor-web",
      cwd: ".",
      script: "cmd",
      args: "/c pnpm start",
      env: {
        NODE_ENV: "production",
      },
      autorestart: true,
      max_restarts: 10,
      restart_delay: 3000,
    },
    {
      name: "confor-geo-worker",
      cwd: ".",
      script: "cmd",
      args: "/c pnpm worker:geo",
      env: {
        NODE_ENV: "production",
        GEO_WORKER_INTERVAL_MS: "5000",
        GEO_IMPORT_BATCH_SIZE: "500",
        GEO_RECALC_BATCH_SIZE: "200",
        GEO_VARIATION_BATCH_SIZE: "300",
      },
      autorestart: true,
      max_restarts: 20,
      restart_delay: 3000,
    },
  ],
};
Batch sizes in production (500/200/300) are significantly higher than development defaults (5/10/15) for optimal throughput.

3. Build and Start

# Build the application
pnpm build

# Start all processes
pm2 start ecosystem.config.cjs

# Expected output:
┌─────┬──────────────────────┬─────────┬─────────┬───────────┐
 id name mode status cpu
├─────┼──────────────────────┼─────────┼─────────┼───────────┤
 0 confor-web fork online 0%
 1 confor-geo-worker fork online 0%
└─────┴──────────────────────┴─────────┴─────────┴───────────┘

4. Persist Configuration

# Save current process list
pm2 save

# Setup startup script (Windows)
pm2 startup

# Follow the instructions shown by PM2
# Then save again
pm2 save
On Windows, PM2 startup requires administrator privileges. Run the command PM2 displays in an elevated command prompt.

Using Docker

Create a dedicated worker container:
FROM node:20-alpine

# Install dependencies for PostGIS/Turf.js native modules
RUN apk add --no-cache \
    postgresql-client \
    python3 \
    make \
    g++

WORKDIR /app

# Copy package files
COPY package*.json pnpm-lock.yaml ./

# Install pnpm and dependencies
RUN npm install -g pnpm
RUN pnpm install --frozen-lockfile --prod

# Copy application code
COPY . .

# Generate Prisma client
RUN pnpm db:generate

# Set environment
ENV NODE_ENV=production

CMD ["pnpm", "worker:geo"]

Using Systemd (Linux)

Create a system service for the worker:
/etc/systemd/system/confor-worker.service
[Unit]
Description=CONFOR Geospatial Worker
After=network.target postgresql.service

[Service]
Type=simple
User=confor
WorkingDirectory=/opt/confor
Environment="NODE_ENV=production"
Environment="DATABASE_URL=postgresql://user:pass@localhost:5432/confor"
Environment="GEO_WORKER_INTERVAL_MS=5000"
Environment="GEO_IMPORT_BATCH_SIZE=500"
Environment="GEO_RECALC_BATCH_SIZE=200"
ExecStart=/usr/bin/pnpm worker:geo
Restart=always
RestartSec=3
StandardOutput=append:/var/log/confor/worker.log
StandardError=append:/var/log/confor/worker-error.log

[Install]
WantedBy=multi-user.target
Enable and start the service:
# Reload systemd
sudo systemctl daemon-reload

# Enable service
sudo systemctl enable confor-worker

# Start service
sudo systemctl start confor-worker

# Check status
sudo systemctl status confor-worker

# View logs
sudo journalctl -u confor-worker -f

Configuration Reference

Environment Variables

VariableTypeDefaultProductionDescription
GEO_WORKER_INTERVAL_MSnumber40005000Milliseconds between polling cycles
GEO_IMPORT_BATCH_SIZEnumber5500Max import jobs processed per cycle
GEO_RECALC_BATCH_SIZEnumber10200Max recalc jobs processed per cycle
GEO_VARIATION_BATCH_SIZEnumber15300Max variation jobs processed per cycle
GEO_WORKER_RUN_ONCEbooleanfalsefalseExit after one cycle (testing)
GEO_WORKER_SECRETstring-RequiredSecret for worker API endpoints
DATABASE_URLstringRequiredRequiredPostgreSQL connection string
NODE_ENVstringdevelopmentproductionRuntime environment

Batch Size Tuning

Choose batch sizes based on your server capacity:
Suitable for: Small organizations, under 100 imports/day
GEO_WORKER_INTERVAL_MS=5000
GEO_IMPORT_BATCH_SIZE=10
GEO_RECALC_BATCH_SIZE=20
GEO_VARIATION_BATCH_SIZE=30

Worker Implementation

The worker is implemented in src/workers/geo-worker-scheduler.ts:
import "dotenv/config";
import { prisma } from "@/lib/prisma";
import { 
  processNextGeoVariationJob, 
  processNextPendingImportJob, 
  processNextRecalcJob 
} from "@/lib/geo-import-worker";

const intervalMs = Number.parseInt(
  process.env.GEO_WORKER_INTERVAL_MS ?? "4000", 10
);
const importBatchSize = Number.parseInt(
  process.env.GEO_IMPORT_BATCH_SIZE ?? "5", 10
);
const recalcBatchSize = Number.parseInt(
  process.env.GEO_RECALC_BATCH_SIZE ?? "10", 10
);
const variationBatchSize = Number.parseInt(
  process.env.GEO_VARIATION_BATCH_SIZE ?? "15", 10
);
const runOnce = process.env.GEO_WORKER_RUN_ONCE === "true";

async function runBatch() {
  // Process import jobs
  for (let i = 0; i < importBatchSize; i++) {
    const result = await processNextPendingImportJob();
    if (!result.processed) break;
  }
  
  // Process recalculation jobs
  for (let i = 0; i < recalcBatchSize; i++) {
    const result = await processNextRecalcJob();
    if (!result.processed) break;
  }
  
  // Process variation jobs
  for (let i = 0; i < variationBatchSize; i++) {
    const result = await processNextGeoVariationJob();
    if (!result.processed) break;
  }
}

Job Processing Flow

1

Poll Database

Worker queries for pending jobs in PENDING status
2

Claim Job

Update job status to PROCESSING with worker instance ID
3

Process Job

  • Parse shapefile ZIP
  • Validate geometry with PostGIS
  • Transform to EPSG:4326
  • Link to Level 2/3 hierarchy
  • Calculate surface areas
4

Update Status

Mark job as COMPLETED or FAILED with error details
5

Repeat

Continue to next job in batch

Monitoring and Observability

Log Format

The worker outputs structured JSON logs:
{
  "timestamp": "2026-03-09T14:30:05.456Z",
  "level": "info",
  "message": "processed import_jobs=2 recalc_jobs=1 variation_jobs=0"
}

Metrics to Monitor

Job Throughput

Jobs processed per minute/hour

Queue Depth

Number of pending jobs

Error Rate

Percentage of failed jobs

Processing Time

Average time per job type

Database Queries

SELECT 
  status,
  COUNT(*) as count,
  AVG(EXTRACT(EPOCH FROM (finished_at - started_at))) as avg_duration_sec
FROM "GeospatialImportJob"
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY status;

Scaling Strategies

Horizontal Scaling

Run multiple worker instances:
# With PM2
pm2 start ecosystem.config.cjs --only confor-geo-worker -i 3

# With Docker
docker compose up -d --scale worker=5
Multiple workers will compete for jobs. Ensure your batch sizes account for this:
  • 3 workers × 100 batch size = 300 jobs processed per cycle
  • Monitor for idle workers if queue depth is low

Vertical Scaling

Increase batch sizes on powerful servers:
# Single worker on 8-core server
GEO_IMPORT_BATCH_SIZE=1000
GEO_RECALC_BATCH_SIZE=500

Database Optimization

Add indexes for worker queries:
CREATE INDEX CONCURRENTLY idx_import_job_status_created 
  ON "GeospatialImportJob"(status, created_at) 
  WHERE status = 'PENDING';

CREATE INDEX CONCURRENTLY idx_recalc_job_status 
  ON "GeospatialRecalcJob"(status, created_at) 
  WHERE status = 'PENDING';

Troubleshooting

Symptoms: Jobs stuck in PENDING statusChecklist:
  1. Verify worker is running: pm2 status or ps aux | grep worker
  2. Check worker logs: pm2 logs confor-geo-worker --lines 50
  3. Verify database connection: Test DATABASE_URL with psql
  4. Check for exceptions in logs
  5. Verify batch sizes are not 0
Solutions:
# Restart worker
pm2 restart confor-geo-worker

# Run in debug mode
DEBUG=* pnpm worker:geo

# Check database connectivity
npx prisma db pull
Symptoms: Many jobs with FAILED statusCommon causes:
  • Invalid shapefile format (missing .prj, .shx, etc.)
  • Unsupported coordinate system
  • Invalid geometries (self-intersections)
  • Missing required attributes (nivel2_id, nivel3_id, nivel4_id)
  • Hierarchy references don’t exist in database
Debug:
-- Find error patterns
SELECT 
  error_message,
  COUNT(*) as occurrences
FROM "GeospatialImportJob"
WHERE status = 'FAILED'
  AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY error_message
ORDER BY occurrences DESC;
Symptoms: Out of memory errors, worker crashesSolutions:
  • Reduce batch sizes:
    GEO_IMPORT_BATCH_SIZE=50
    GEO_RECALC_BATCH_SIZE=20
    
  • Increase restart_delay in PM2 config
  • Add memory limit to PM2:
    max_memory_restart: "500M"
    
  • Use horizontal scaling instead of large batches
Symptoms: Queue depth growing, long processing timesSolutions:
  1. Scale horizontally: Add more worker instances
  2. Increase batch sizes (if CPU/memory allows)
  3. Reduce polling interval:
    GEO_WORKER_INTERVAL_MS=2000
    
  4. Optimize database:
    VACUUM ANALYZE "GeospatialImportJob";
    VACUUM ANALYZE "PatrimonyLevel4";
    
  5. Add database indexes (see Scaling > Database Optimization)
Symptoms: Worker runs for hours then stopsPossible causes:
  • Database connection timeout
  • Unhandled promise rejection
  • Memory leak causing restart
Solutions:
// In ecosystem.config.cjs
{
  max_restarts: 50,           // Allow more restarts
  restart_delay: 5000,        // Wait 5s between restarts
  exp_backoff_restart_delay: 100,  // Exponential backoff
  max_memory_restart: "1G",   // Restart at 1GB
}
Check logs for patterns before crashes:
pm2 logs confor-geo-worker --lines 200 | grep -i error

Best Practices

Use PM2 in Production

PM2 provides automatic restarts, log management, and monitoring.

Monitor Queue Depth

Alert when pending jobs exceed threshold (e.g., >1000).

Set Conservative Defaults

Start with low batch sizes and increase based on performance.

Separate Worker Server

Run workers on dedicated infrastructure in production.

Enable Graceful Shutdown

Handle SIGTERM/SIGINT to finish current jobs before exiting.

Implement Dead Letter Queue

Move repeatedly failing jobs to a separate queue for investigation.

Next Steps

Environment Variables

Complete environment variable reference

Database Setup

PostgreSQL and Prisma configuration

Shapefile Import

Learn about shapefile requirements

API Reference

Geospatial job API endpoints

Build docs developers (and LLMs) love