Worker Setup - CONFOR

Overview

The CONFOR geospatial worker is a background process that handles:

Shapefile imports: Processing uploaded ZIP files containing Level 4 geometry
Surface recalculations: Updating surface areas for Level 2 and Level 3 units
Geometry variations: Tracking changes in polygon geometries over time

The worker runs independently of the web application and can be scaled horizontally.

Architecture

The worker uses a polling architecture with configurable intervals and batch sizes. Multiple worker instances can run simultaneously for load distribution.

Quick Start

Development Mode

# Start the worker (runs until stopped)
pnpm worker:geo

# Or with npm
npm run worker:geo

# Worker output:
[2026-03-09T14:30:00.123Z] [geo-worker] started (interval=4000ms, importBatch=5, recalcBatch=10, variationBatch=15, runOnce=false)
[2026-03-09T14:30:05.456Z] [geo-worker] processed import_jobs=2 recalc_jobs=1 variation_jobs=0

Production Deployment

Using PM2 (Recommended)

PM2 provides process management, automatic restarts, and monitoring.

1. Install PM2

npm install -g pm2

2. Configure Ecosystem File

The project includes ecosystem.config.cjs with optimized settings:

ecosystem.config.cjs

module.exports = {
  apps: [
    {
      name: "confor-web",
      cwd: ".",
      script: "cmd",
      args: "/c pnpm start",
      env: {
        NODE_ENV: "production",
      },
      autorestart: true,
      max_restarts: 10,
      restart_delay: 3000,
    },
    {
      name: "confor-geo-worker",
      cwd: ".",
      script: "cmd",
      args: "/c pnpm worker:geo",
      env: {
        NODE_ENV: "production",
        GEO_WORKER_INTERVAL_MS: "5000",
        GEO_IMPORT_BATCH_SIZE: "500",
        GEO_RECALC_BATCH_SIZE: "200",
        GEO_VARIATION_BATCH_SIZE: "300",
      },
      autorestart: true,
      max_restarts: 20,
      restart_delay: 3000,
    },
  ],
};

Batch sizes in production (500/200/300) are significantly higher than development defaults (5/10/15) for optimal throughput.

3. Build and Start

# Build the application
pnpm build

# Start all processes
pm2 start ecosystem.config.cjs

# Expected output:
┌─────┬──────────────────────┬─────────┬─────────┬───────────┐
│ id  │ name                 │ mode    │ status  │ cpu       │
├─────┼──────────────────────┼─────────┼─────────┼───────────┤
│ 0   │ confor-web           │ fork    │ online  │ 0%        │
│ 1   │ confor-geo-worker    │ fork    │ online  │ 0%        │
└─────┴──────────────────────┴─────────┴─────────┴───────────┘

4. Persist Configuration

# Save current process list
pm2 save

# Setup startup script (Windows)
pm2 startup

# Follow the instructions shown by PM2
# Then save again
pm2 save

On Windows, PM2 startup requires administrator privileges. Run the command PM2 displays in an elevated command prompt.

Using Docker

Create a dedicated worker container:

FROM node:20-alpine

# Install dependencies for PostGIS/Turf.js native modules
RUN apk add --no-cache \
    postgresql-client \
    python3 \
    make \
    g++

WORKDIR /app

# Copy package files
COPY package*.json pnpm-lock.yaml ./

# Install pnpm and dependencies
RUN npm install -g pnpm
RUN pnpm install --frozen-lockfile --prod

# Copy application code
COPY . .

# Generate Prisma client
RUN pnpm db:generate

# Set environment
ENV NODE_ENV=production

CMD ["pnpm", "worker:geo"]

Using Systemd (Linux)

Create a system service for the worker:

/etc/systemd/system/confor-worker.service

[Unit]
Description=CONFOR Geospatial Worker
After=network.target postgresql.service

[Service]
Type=simple
User=confor
WorkingDirectory=/opt/confor
Environment="NODE_ENV=production"
Environment="DATABASE_URL=postgresql://user:pass@localhost:5432/confor"
Environment="GEO_WORKER_INTERVAL_MS=5000"
Environment="GEO_IMPORT_BATCH_SIZE=500"
Environment="GEO_RECALC_BATCH_SIZE=200"
ExecStart=/usr/bin/pnpm worker:geo
Restart=always
RestartSec=3
StandardOutput=append:/var/log/confor/worker.log
StandardError=append:/var/log/confor/worker-error.log

[Install]
WantedBy=multi-user.target

Enable and start the service:

# Reload systemd
sudo systemctl daemon-reload

# Enable service
sudo systemctl enable confor-worker

# Start service
sudo systemctl start confor-worker

# Check status
sudo systemctl status confor-worker

# View logs
sudo journalctl -u confor-worker -f

Configuration Reference

Environment Variables

Variable	Type	Default	Production	Description
`GEO_WORKER_INTERVAL_MS`	number	`4000`	`5000`	Milliseconds between polling cycles
`GEO_IMPORT_BATCH_SIZE`	number	`5`	`500`	Max import jobs processed per cycle
`GEO_RECALC_BATCH_SIZE`	number	`10`	`200`	Max recalc jobs processed per cycle
`GEO_VARIATION_BATCH_SIZE`	number	`15`	`300`	Max variation jobs processed per cycle
`GEO_WORKER_RUN_ONCE`	boolean	`false`	`false`	Exit after one cycle (testing)
`GEO_WORKER_SECRET`	string	-	Required	Secret for worker API endpoints
`DATABASE_URL`	string	Required	Required	PostgreSQL connection string
`NODE_ENV`	string	`development`	`production`	Runtime environment

Batch Size Tuning

Choose batch sizes based on your server capacity:

Light Load
Medium Load
Heavy Load

Suitable for: Small organizations, under 100 imports/day

GEO_WORKER_INTERVAL_MS=5000
GEO_IMPORT_BATCH_SIZE=10
GEO_RECALC_BATCH_SIZE=20
GEO_VARIATION_BATCH_SIZE=30

Suitable for: Medium organizations, 100-500 imports/day

GEO_WORKER_INTERVAL_MS=5000
GEO_IMPORT_BATCH_SIZE=100
GEO_RECALC_BATCH_SIZE=50
GEO_VARIATION_BATCH_SIZE=100

Suitable for: Large organizations, over 500 imports/day

GEO_WORKER_INTERVAL_MS=3000
GEO_IMPORT_BATCH_SIZE=500
GEO_RECALC_BATCH_SIZE=200
GEO_VARIATION_BATCH_SIZE=300

Consider running multiple worker instances.

Worker Implementation

The worker is implemented in src/workers/geo-worker-scheduler.ts:

import "dotenv/config";
import { prisma } from "@/lib/prisma";
import { 
  processNextGeoVariationJob, 
  processNextPendingImportJob, 
  processNextRecalcJob 
} from "@/lib/geo-import-worker";

const intervalMs = Number.parseInt(
  process.env.GEO_WORKER_INTERVAL_MS ?? "4000", 10
);
const importBatchSize = Number.parseInt(
  process.env.GEO_IMPORT_BATCH_SIZE ?? "5", 10
);
const recalcBatchSize = Number.parseInt(
  process.env.GEO_RECALC_BATCH_SIZE ?? "10", 10
);
const variationBatchSize = Number.parseInt(
  process.env.GEO_VARIATION_BATCH_SIZE ?? "15", 10
);
const runOnce = process.env.GEO_WORKER_RUN_ONCE === "true";

async function runBatch() {
  // Process import jobs
  for (let i = 0; i < importBatchSize; i++) {
    const result = await processNextPendingImportJob();
    if (!result.processed) break;
  }
  
  // Process recalculation jobs
  for (let i = 0; i < recalcBatchSize; i++) {
    const result = await processNextRecalcJob();
    if (!result.processed) break;
  }
  
  // Process variation jobs
  for (let i = 0; i < variationBatchSize; i++) {
    const result = await processNextGeoVariationJob();
    if (!result.processed) break;
  }
}

Job Processing Flow

Poll Database

Worker queries for pending jobs in PENDING status

Claim Job

Update job status to PROCESSING with worker instance ID

Process Job

Parse shapefile ZIP
Validate geometry with PostGIS
Transform to EPSG:4326
Link to Level 2/3 hierarchy
Calculate surface areas

Update Status

Mark job as COMPLETED or FAILED with error details

Repeat

Continue to next job in batch

Monitoring and Observability

Log Format

The worker outputs structured JSON logs:

{
  "timestamp": "2026-03-09T14:30:05.456Z",
  "level": "info",
  "message": "processed import_jobs=2 recalc_jobs=1 variation_jobs=0"
}

Metrics to Monitor

Job Throughput

Jobs processed per minute/hour

Queue Depth

Number of pending jobs

Error Rate

Percentage of failed jobs

Processing Time

Average time per job type

Database Queries

SELECT 
  status,
  COUNT(*) as count,
  AVG(EXTRACT(EPOCH FROM (finished_at - started_at))) as avg_duration_sec
FROM "GeospatialImportJob"
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY status;

Scaling Strategies

Horizontal Scaling

Run multiple worker instances:

# With PM2
pm2 start ecosystem.config.cjs --only confor-geo-worker -i 3

# With Docker
docker compose up -d --scale worker=5

Multiple workers will compete for jobs. Ensure your batch sizes account for this:

3 workers × 100 batch size = 300 jobs processed per cycle
Monitor for idle workers if queue depth is low

Vertical Scaling

Increase batch sizes on powerful servers:

# Single worker on 8-core server
GEO_IMPORT_BATCH_SIZE=1000
GEO_RECALC_BATCH_SIZE=500

Database Optimization

Add indexes for worker queries:

CREATE INDEX CONCURRENTLY idx_import_job_status_created 
  ON "GeospatialImportJob"(status, created_at) 
  WHERE status = 'PENDING';

CREATE INDEX CONCURRENTLY idx_recalc_job_status 
  ON "GeospatialRecalcJob"(status, created_at) 
  WHERE status = 'PENDING';

Troubleshooting

Worker not processing jobs

Symptoms: Jobs stuck in PENDING statusChecklist:

Verify worker is running: pm2 status or ps aux | grep worker
Check worker logs: pm2 logs confor-geo-worker --lines 50
Verify database connection: Test DATABASE_URL with psql
Check for exceptions in logs
Verify batch sizes are not 0

Solutions:

# Restart worker
pm2 restart confor-geo-worker

# Run in debug mode
DEBUG=* pnpm worker:geo

# Check database connectivity
npx prisma db pull

High error rate on imports

Symptoms: Many jobs with FAILED statusCommon causes:

Invalid shapefile format (missing .prj, .shx, etc.)
Unsupported coordinate system
Invalid geometries (self-intersections)
Missing required attributes (nivel2_id, nivel3_id, nivel4_id)
Hierarchy references don’t exist in database

Debug:

-- Find error patterns
SELECT 
  error_message,
  COUNT(*) as occurrences
FROM "GeospatialImportJob"
WHERE status = 'FAILED'
  AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY error_message
ORDER BY occurrences DESC;

Worker consuming too much memory

Symptoms: Out of memory errors, worker crashesSolutions:

Reduce batch sizes:

GEO_IMPORT_BATCH_SIZE=50
GEO_RECALC_BATCH_SIZE=20

Increase restart_delay in PM2 config
Add memory limit to PM2:
```
max_memory_restart: "500M"
```
Use horizontal scaling instead of large batches

Jobs processing too slowly

Symptoms: Queue depth growing, long processing timesSolutions:

Scale horizontally: Add more worker instances
Increase batch sizes (if CPU/memory allows)
Reduce polling interval:
```
GEO_WORKER_INTERVAL_MS=2000
```

Optimize database:

VACUUM ANALYZE "GeospatialImportJob";
VACUUM ANALYZE "PatrimonyLevel4";

Add database indexes (see Scaling > Database Optimization)

Worker stops after some time

Symptoms: Worker runs for hours then stopsPossible causes:

Database connection timeout
Unhandled promise rejection
Memory leak causing restart

Solutions:

// In ecosystem.config.cjs
{
  max_restarts: 50,           // Allow more restarts
  restart_delay: 5000,        // Wait 5s between restarts
  exp_backoff_restart_delay: 100,  // Exponential backoff
  max_memory_restart: "1G",   // Restart at 1GB
}

Check logs for patterns before crashes:

pm2 logs confor-geo-worker --lines 200 | grep -i error

Best Practices

Use PM2 in Production

PM2 provides automatic restarts, log management, and monitoring.

Monitor Queue Depth

Alert when pending jobs exceed threshold (e.g., >1000).

Set Conservative Defaults

Start with low batch sizes and increase based on performance.

Separate Worker Server

Run workers on dedicated infrastructure in production.

Enable Graceful Shutdown

Handle SIGTERM/SIGINT to finish current jobs before exiting.

Implement Dead Letter Queue

Move repeatedly failing jobs to a separate queue for investigation.

Next Steps

Environment Variables

Complete environment variable reference

Database Setup

PostgreSQL and Prisma configuration

Shapefile Import

Learn about shapefile requirements

API Reference

Geospatial job API endpoints

Get Started

Core Features

User Guides

Configuration

​Overview

​Architecture

​Quick Start

​Development Mode

​Production Deployment

​Using PM2 (Recommended)

​1. Install PM2

​2. Configure Ecosystem File

​3. Build and Start

​4. Persist Configuration

​Using Docker

​Using Systemd (Linux)

​Configuration Reference

​Environment Variables

​Batch Size Tuning

​Worker Implementation

​Job Processing Flow

​Monitoring and Observability

​Log Format

​Metrics to Monitor

Job Throughput

Queue Depth

Error Rate

Processing Time

​Database Queries

​Scaling Strategies

​Horizontal Scaling

​Vertical Scaling

​Database Optimization

​Troubleshooting

​Best Practices

Use PM2 in Production

Monitor Queue Depth

Set Conservative Defaults

Separate Worker Server

Enable Graceful Shutdown

Implement Dead Letter Queue

​Next Steps

Environment Variables

Database Setup

Shapefile Import

API Reference

Build docs developers (and LLMs) love

Overview

Architecture

Quick Start

Development Mode

Production Deployment

Using PM2 (Recommended)

1. Install PM2

2. Configure Ecosystem File

3. Build and Start

4. Persist Configuration

Using Docker

Using Systemd (Linux)

Configuration Reference

Environment Variables

Batch Size Tuning

Worker Implementation

Job Processing Flow

Monitoring and Observability

Log Format

Metrics to Monitor

Database Queries

Scaling Strategies

Horizontal Scaling

Vertical Scaling

Database Optimization

Troubleshooting

Best Practices

Next Steps