Skip to main content
Use this checklist to ensure your Runner application is production-ready. These recommendations come from real-world deployments and cover security, reliability, observability, and operations.

Build and Runtime

Requirement: Pin Node to a supported LTS line (>=18)
// package.json
{
  "engines": {
    "node": ">=18.0.0"
  }
}
Why: Runner requires Node.js 18+ for fetch, AbortController, and modern async features.Check:
node --version  # Should be 18.x or higher
Requirement: Build in CI with npm run qa
# .github/workflows/ci.yml
- name: Quality Assurance
  run: npm run qa
Why: Catches type errors, linting issues, and test failures before deployment.What npm run qa does:
  • TypeScript type checking
  • ESLint/Prettier validation
  • Full test suite with 100% coverage enforcement
Requirement: Run from compiled output (no ts-node in production)
// package.json
{
  "scripts": {
    "build": "tsup",
    "start": "node dist/index.js"
  }
}
Why: ts-node adds significant startup time and memory overhead.Never do this in production:
# Bad - slow startup, high memory
ts-node src/index.ts

# Good - fast startup, optimized
node dist/index.js

Security

Security misconfigurations can expose your application to unauthorized access. Review these carefully.
Requirement: Configure exposure auth for tunnels and avoid anonymous exposure
import { nodeExposure } from "@bluelibs/runner/node";

const exposure = nodeExposure("app.exposure", {
  http: {
    port: 3000,
    auth: {
      type: "bearer",
      verify: async (token) => {
        // Validate token against your auth system
        return verifyToken(token);
      },
    },
  },
});
Never expose without auth:
// Bad - anyone can call your tasks
const exposure = nodeExposure("app.exposure", {
  http: { port: 3000 },
});
Requirement: Use allow-lists for remotely callable task/event ids
const exposure = nodeExposure("app.exposure", {
  http: {
    port: 3000,
    allowTaskIds: [
      "api.tasks.public.*",  // Public API tasks only
      "api.tasks.user.get",
    ],
    allowEventIds: [
      "api.events.public.*",
    ],
  },
});
Why: Prevents accidental exposure of internal tasks/events.
Requirement: Set payload limits for JSON/multipart traffic
const exposure = nodeExposure("app.exposure", {
  http: {
    port: 3000,
    bodyLimit: 10 * 1024 * 1024, // 10MB max
  },
});
Why: Protects against denial-of-service attacks via large payloads.
Requirement: Review logs for sensitive data before enabling external sinks
import { globals } from "@bluelibs/runner";

const logger = globals.resources.logger;

// Bad - leaks sensitive data
await logger.info("User login", { password: user.password });

// Good - sanitized
await logger.info("User login", { userId: user.id });
Common sensitive fields to avoid logging:
  • Passwords, tokens, API keys
  • Credit card numbers, SSNs
  • Email addresses (in some jurisdictions)

Reliability

Requirement: Define timeout/retry/circuit-breaker defaults for external I/O tasks
import { r, globals } from "@bluelibs/runner";

const callExternalAPI = r
  .task("api.external")
  .middleware([
    globals.middleware.task.timeout.with({ ttl: 5000 }),
    globals.middleware.task.retry.with({ 
      retries: 3,
      delayStrategy: (attempt) => 100 * Math.pow(2, attempt),
    }),
    globals.middleware.task.circuitBreaker.with({
      failureThreshold: 5,
      resetTimeout: 30000,
    }),
  ])
  .run(async (url: string) => {
    return await fetch(url);
  })
  .build();
Why: External services fail. Proper error handling prevents cascading failures.
Requirement: Verify graceful shutdown path with SIGTERM in staging
import { run } from "@bluelibs/runner";

const { dispose } = await run(app, {
  shutdownHooks: true, // Auto-handle SIGINT/SIGTERM
});

// Manual shutdown
process.on("SIGTERM", async () => {
  console.log("Received SIGTERM, shutting down gracefully...");
  await dispose();
  process.exit(0);
});
Test in staging:
# Start your app
node dist/index.js &
PID=$!

# Send SIGTERM
kill -TERM $PID

# Should see graceful shutdown logs
Requirement: Ensure resource disposal order is validated in integration tests
test("resources dispose in correct order", async () => {
  const disposed: string[] = [];

  const db = r
    .resource("app.db")
    .init(async () => ({ connected: true }))
    .dispose(async () => { disposed.push("db"); })
    .build();

  const server = r
    .resource("app.server")
    .dependencies({ db })
    .init(async () => ({ listening: true }))
    .dispose(async () => { disposed.push("server"); })
    .build();

  const app = r.resource("app").register([db, server]).build();
  const { dispose } = await run(app);
  await dispose();

  // Server depends on db, so server disposes first
  expect(disposed).toEqual(["server", "db"]);
});
Why: Incorrect disposal order can cause connection leaks or errors.

Observability

Without observability, you’re flying blind in production. These are the baseline requirements.
Requirement: Emit structured logs with stable source ids
import { globals } from "@bluelibs/runner";

const processOrder = r
  .task("orders.process")
  .dependencies({ logger: globals.resources.logger })
  .run(async (input, { logger }) => {
    await logger.info("Processing order", {
      data: {
        orderId: input.orderId,
        amount: input.amount,
      },
    });

    try {
      const result = await processPayment(input);
      await logger.info("Order processed", {
        data: { orderId: input.orderId, transactionId: result.id },
      });
      return result;
    } catch (error) {
      await logger.error("Order processing failed", {
        error,
        data: { orderId: input.orderId },
      });
      throw error;
    }
  })
  .build();
Log format:
  • timestamp: ISO 8601
  • level: debug/info/warn/error
  • source: task/resource ID
  • data: structured payload
  • error: stack trace and details
Requirement: Track latency and error-rate metrics per critical task path
import { globals } from "@bluelibs/runner";

const criticalTask = r
  .task("critical.operation")
  .dependencies({ 
    logger: globals.resources.logger,
    metrics: metricsResource, // Your metrics collector
  })
  .run(async (input, { logger, metrics }) => {
    const start = Date.now();
    try {
      const result = await doWork(input);
      metrics.histogram("critical.operation.duration", Date.now() - start);
      metrics.increment("critical.operation.success");
      return result;
    } catch (error) {
      metrics.increment("critical.operation.error");
      throw error;
    }
  })
  .build();
Key metrics to track:
  • Request rate (requests/second)
  • Error rate (errors/second)
  • Latency (p50, p95, p99)
  • Task execution time
Requirement: Export traces for cross-service flows
import { r } from "@bluelibs/runner";

const tracingMiddleware = r.middleware.task
  .configurable<{ serviceName: string }>()
  .run(async (config, input, { next }) => {
    const traceId = input.traceId || generateTraceId();
    const span = tracer.startSpan(config.serviceName, { traceId });
    
    try {
      const result = await next({ ...input, traceId });
      span.finish();
      return result;
    } catch (error) {
      span.setTag("error", true);
      span.log({ error: error.message });
      span.finish();
      throw error;
    }
  })
  .build();

const myTask = r
  .task("service.operation")
  .middleware([tracingMiddleware.with({ serviceName: "my-service" })])
  .run(async (input) => {
    // Operation is traced automatically
  })
  .build();
Requirement: Configure baseline alerts for error-rate spikes and sustained p95 latencyExample alert rules:
MetricThresholdDurationAction
Error rate> 5%5 minutesPage on-call
p95 latency> 1000ms10 minutesNotify team
Availability< 99.9%5 minutesPage on-call
Memory usage> 85%5 minutesAuto-scale or alert
Common alerting platforms:
  • Datadog
  • Prometheus + Alertmanager
  • New Relic
  • Sentry

Operations

Requirement: Expose /health (or equivalent) and wire container/platform checks
import express from "express";
import { r, run } from "@bluelibs/runner";

const healthCheck = r
  .task("app.health")
  .dependencies({ db, cache })
  .run(async (_, { db, cache }) => {
    // Check critical dependencies
    const dbHealthy = await db.ping();
    const cacheHealthy = await cache.ping();

    if (!dbHealthy || !cacheHealthy) {
      throw new Error("Unhealthy");
    }

    return {
      status: "healthy",
      timestamp: new Date().toISOString(),
      dependencies: {
        db: dbHealthy ? "up" : "down",
        cache: cacheHealthy ? "up" : "down",
      },
    };
  })
  .build();

const server = r
  .resource("app.server")
  .dependencies({ healthCheck })
  .init(async (_, { healthCheck }) => {
    const app = express();
    
    app.get("/health", async (req, res) => {
      try {
        const health = await healthCheck.run({}, {});
        res.json(health);
      } catch (error) {
        res.status(503).json({ status: "unhealthy", error: error.message });
      }
    });

    const listener = app.listen(3000);
    return { app, listener };
  })
  .build();
Requirement: Maintain runbooks for incident triage and rollbackExample runbook structure:
# Runbook: High Error Rate

## Symptoms
- Error rate > 5%
- Users reporting failures
- Alerts firing

## Triage Steps
1. Check recent deployments
2. Review error logs
3. Check external service status
4. Verify database connectivity

## Mitigation
1. Rollback to previous version
2. Scale up resources
3. Enable circuit breakers

## Rollback Procedure
```bash
# Rollback to previous deployment
kubectl rollout undo deployment/my-app

# Or with your deployment tool
pm2 reload my-app --update-env

Escalation

  • On-call: #oncall-team
  • Engineering lead: @lead
  • Incident commander: @ic
</Accordion>

<Accordion title="Release Review" icon="clipboard-check">
**Requirement:** Review release notes before upgrades and test migrations in staging

**Upgrade process:**

1. **Review release notes:**
   - Check [GitHub Releases](https://github.com/bluelibs/runner/releases)
   - Look for breaking changes
   - Read migration guides

2. **Test in staging:**
   ```bash
   # Update to new version
   npm install @bluelibs/runner@latest

   # Run tests
   npm run qa

   # Deploy to staging
   deploy staging

   # Run integration tests
   npm run test:integration

   # Monitor for 24 hours
  1. Production deployment:
    • Deploy during low-traffic window
    • Use canary or blue-green deployment
    • Monitor metrics closely
    • Have rollback plan ready

Deployment Checklist

Use this final checklist before promoting to production:
CategoryCheckStatus
BuildNode.js >= 18[ ]
BuildCI runs npm run qa[ ]
BuildUsing compiled output (not ts-node)[ ]
SecurityTunnel auth configured[ ]
SecurityTask/event allow-lists defined[ ]
SecurityPayload limits set[ ]
SecurityLogs sanitized[ ]
ReliabilityTimeout/retry configured[ ]
ReliabilityGraceful shutdown tested[ ]
ReliabilityResource disposal order validated[ ]
ObservabilityStructured logging enabled[ ]
ObservabilityMetrics collection configured[ ]
ObservabilityDistributed tracing enabled[ ]
ObservabilityAlerts configured[ ]
OperationsHealth check endpoint exposed[ ]
OperationsRunbooks documented[ ]
OperationsRelease notes reviewed[ ]

Support and SLAs

For enterprise deployments with SLA requirements, see the Enterprise Support guide. Current support channels:
  • Stable: 5.x (current feature line)
  • Maintenance/LTS: 4.x (critical fixes only)
Security contact: [email protected]

Additional Resources

Observability Strategy

Detailed guide on logs, metrics, and traces

Enterprise Support

Professional and enterprise support plans

Migration Guide

Upgrading between Runner versions

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love