Stalled Jobs

From BullMQ 2.0 onwards, the QueueScheduler is not needed anymore. For manually checking stalled jobs, see the manually fetching jobs pattern.

What is a Stalled Job?

When a job is in an active state (being processed by a worker), it needs to continuously update the queue to notify that the worker is still working on it. This mechanism prevents a worker that crashes or enters an endless loop from keeping a job in an active state forever. When a worker fails to notify the queue that it’s still working on a job, that job is moved back to the waiting list or to the failed set. We say the job has stalled, and the queue emits a stalled event. From src/classes/job.ts:118:

export class Job {
  /**
   * Number of times where job has stalled.
   * @defaultValue 0
   */
  stalledCounter = 0;
}

There is no “stalled” state - only a stalled event emitted when a job is automatically moved from active to waiting state.

Stalled Job Limit

If a job stalls more than a predefined limit (see the maxStalledCount option), the job will be failed permanently with the error “job stalled more than allowable limit”.

The default maxStalledCount is 1, as stalled jobs should be a rare occurrence. You can increase this number if needed, but investigate why jobs are stalling instead.

import { Worker } from 'bullmq';

const worker = new Worker(
  'Paint',
  async job => {
    // Process job
  },
  {
    maxStalledCount: 3, // Allow job to stall up to 3 times
  }
);

Preventing Stalled Jobs

To avoid stalled jobs:

1. Avoid CPU-Intensive Operations

Make sure your worker doesn’t keep the Node.js event loop too busy. The default max stalled check duration is 30 seconds.

// Bad - blocks event loop
worker.process(async job => {
  for (let i = 0; i < 1000000000; i++) {
    // CPU-intensive loop
  }
});

// Good - use async operations
worker.process(async job => {
  await processInChunks(job.data);
});

2. Use Sandboxed Processors

Workers can spawn separate Node.js processes, running independently from the main process:

import { Worker } from 'bullmq';

// Worker will load processor from separate file
const worker = new Worker('Paint', './painter.js');

Sandboxed processors provide:

Isolation: Crashes don’t affect the main process
CPU utilization: Better use of multi-core systems
Memory safety: Memory leaks are contained

3. Increase Stall Check Interval

If your jobs legitimately take a long time to process:

import { Worker } from 'bullmq';

const worker = new Worker(
  'Paint',
  async job => {
    // Long-running task
  },
  {
    stalledInterval: 60000, // Check for stalled jobs every 60 seconds
    maxStalledCount: 2,
  }
);

Listening to Stalled Events

Monitor when jobs stall:

import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('Paint');

queueEvents.on('stalled', ({ jobId, prev }) => {
  console.log(`Job ${jobId} stalled. Previous state: ${prev}`);
});

Worker Lock Extension

Workers automatically extend locks on jobs they’re processing. If a worker crashes or loses connection, the lock expires and the job is marked as stalled. From src/classes/job.ts:703:

/**
 * Extend the lock for this job.
 *
 * @param token - unique token for the lock
 * @param duration - lock duration in milliseconds
 */
extendLock(token: string, duration: number): Promise<number> {
  return this.scripts.extendLock(this.id, token, duration);
}

Common Causes of Stalled Jobs

Worker Crashes

If a worker process crashes while processing a job:

// Handle uncaught errors to prevent crashes
process.on('uncaughtException', (error) => {
  console.error('Uncaught exception:', error);
  // Graceful shutdown
});

const worker = new Worker('Paint', async job => {
  try {
    await processJob(job);
  } catch (error) {
    // Handle errors properly
    throw error;
  }
});

Network Issues

Connection loss to Redis:

const worker = new Worker(
  'Paint',
  async job => {
    // Process job
  },
  {
    connection: {
      host: 'redis-server',
      port: 6379,
      retryStrategy: (times) => {
        return Math.min(times * 50, 2000);
      },
    },
  }
);

Blocked Event Loop

CPU-intensive synchronous operations:

import { Worker } from 'bullmq';
import { Worker as WorkerThreads } from 'worker_threads';

const worker = new Worker('Paint', async job => {
  // Offload CPU-intensive work to worker threads
  return new Promise((resolve, reject) => {
    const workerThread = new WorkerThreads('./cpu-intensive.js');
    workerThread.on('message', resolve);
    workerThread.on('error', reject);
    workerThread.postMessage(job.data);
  });
});

Memory Pressure

Out of memory errors:

// Monitor memory usage
const worker = new Worker('Paint', async job => {
  const memUsage = process.memoryUsage();
  if (memUsage.heapUsed > 1024 * 1024 * 1024) { // 1GB
    console.warn('High memory usage:', memUsage);
  }
  
  await processJob(job);
});

Best Practices

Monitor Stalls

Track stalled jobs as a key metric. Frequent stalling indicates a problem.

Graceful Shutdown

Implement proper shutdown procedures to finish processing before terminating.

Use Sandboxing

Isolate job processing in separate processes for better reliability.

Set Realistic Limits

Configure stalledInterval and maxStalledCount based on your job characteristics.

Graceful Shutdown Example

import { Worker } from 'bullmq';

const worker = new Worker('Paint', async job => {
  // Process job
});

const shutdown = async () => {
  console.log('Shutting down worker...');
  
  // Stop accepting new jobs
  await worker.close();
  
  console.log('Worker shut down gracefully');
  process.exit(0);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

Debugging Stalled Jobs

When investigating stalled jobs:

Check worker logs: Look for crashes or errors
Monitor Redis connection: Ensure stable connection
Profile CPU usage: Identify blocking operations
Review job data: Large payloads can cause issues
Check external dependencies: Timeouts from external services

import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('Paint');

queueEvents.on('stalled', async ({ jobId }) => {
  const job = await Job.fromId(queue, jobId);
  
  console.log('Stalled job details:', {
    id: job.id,
    name: job.name,
    data: job.data,
    stalledCounter: job.stalledCounter,
    attemptsMade: job.attemptsMade,
    processedBy: job.processedBy,
  });
});

Worker Options

maxStalledCount API Reference

Manually Fetching Jobs

Pattern for manual stalled job checking

Sandboxed Processors

Learn about sandboxed processors

Queue Events

All available queue events

Getting Started

Core Concepts

Queue Management

Workers

Job Types & Features

Job Schedulers

Flows

Advanced Features

Patterns & Best Practices

Redis Integration

Framework Integration

Production & Operations

Migration Guides

What is a Stalled Job?

Stalled Job Limit

Preventing Stalled Jobs

1. Avoid CPU-Intensive Operations

2. Use Sandboxed Processors

3. Increase Stall Check Interval

Listening to Stalled Events

Worker Lock Extension

Common Causes of Stalled Jobs

Best Practices

Monitor Stalls

Graceful Shutdown

Use Sandboxing

Set Realistic Limits

Graceful Shutdown Example

Debugging Stalled Jobs

Read More

Worker Options

Manually Fetching Jobs

Sandboxed Processors

Queue Events

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Queue Management

Workers

Job Types & Features

Job Schedulers

Flows

Advanced Features

Patterns & Best Practices

Redis Integration

Framework Integration

Production & Operations

Migration Guides

​What is a Stalled Job?

​Stalled Job Limit

​Preventing Stalled Jobs

​1. Avoid CPU-Intensive Operations

​2. Use Sandboxed Processors

​3. Increase Stall Check Interval

​Listening to Stalled Events

​Worker Lock Extension

​Common Causes of Stalled Jobs

​Best Practices

Monitor Stalls

Graceful Shutdown

Use Sandboxing

Set Realistic Limits

​Graceful Shutdown Example

​Debugging Stalled Jobs

​Read More

Worker Options

Manually Fetching Jobs

Sandboxed Processors

Queue Events

Build docs developers (and LLMs) love

What is a Stalled Job?

Stalled Job Limit

Preventing Stalled Jobs

1. Avoid CPU-Intensive Operations

2. Use Sandboxed Processors

3. Increase Stall Check Interval

Listening to Stalled Events

Worker Lock Extension

Common Causes of Stalled Jobs

Best Practices

Graceful Shutdown Example

Debugging Stalled Jobs

Read More