Skip to main content
From BullMQ 2.0 onwards, the QueueScheduler is not needed anymore. For manually checking stalled jobs, see the manually fetching jobs pattern.

What is a Stalled Job?

When a job is in an active state (being processed by a worker), it needs to continuously update the queue to notify that the worker is still working on it. This mechanism prevents a worker that crashes or enters an endless loop from keeping a job in an active state forever. When a worker fails to notify the queue that it’s still working on a job, that job is moved back to the waiting list or to the failed set. We say the job has stalled, and the queue emits a stalled event. From src/classes/job.ts:118:
export class Job {
  /**
   * Number of times where job has stalled.
   * @defaultValue 0
   */
  stalledCounter = 0;
}
There is no “stalled” state - only a stalled event emitted when a job is automatically moved from active to waiting state.

Stalled Job Limit

If a job stalls more than a predefined limit (see the maxStalledCount option), the job will be failed permanently with the error “job stalled more than allowable limit”.
The default maxStalledCount is 1, as stalled jobs should be a rare occurrence. You can increase this number if needed, but investigate why jobs are stalling instead.
import { Worker } from 'bullmq';

const worker = new Worker(
  'Paint',
  async job => {
    // Process job
  },
  {
    maxStalledCount: 3, // Allow job to stall up to 3 times
  }
);

Preventing Stalled Jobs

To avoid stalled jobs:

1. Avoid CPU-Intensive Operations

Make sure your worker doesn’t keep the Node.js event loop too busy. The default max stalled check duration is 30 seconds.
// Bad - blocks event loop
worker.process(async job => {
  for (let i = 0; i < 1000000000; i++) {
    // CPU-intensive loop
  }
});

// Good - use async operations
worker.process(async job => {
  await processInChunks(job.data);
});

2. Use Sandboxed Processors

Workers can spawn separate Node.js processes, running independently from the main process:
import { Worker } from 'bullmq';

// Worker will load processor from separate file
const worker = new Worker('Paint', './painter.js');
Sandboxed processors provide:
  • Isolation: Crashes don’t affect the main process
  • CPU utilization: Better use of multi-core systems
  • Memory safety: Memory leaks are contained

3. Increase Stall Check Interval

If your jobs legitimately take a long time to process:
import { Worker } from 'bullmq';

const worker = new Worker(
  'Paint',
  async job => {
    // Long-running task
  },
  {
    stalledInterval: 60000, // Check for stalled jobs every 60 seconds
    maxStalledCount: 2,
  }
);

Listening to Stalled Events

Monitor when jobs stall:
import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('Paint');

queueEvents.on('stalled', ({ jobId, prev }) => {
  console.log(`Job ${jobId} stalled. Previous state: ${prev}`);
});

Worker Lock Extension

Workers automatically extend locks on jobs they’re processing. If a worker crashes or loses connection, the lock expires and the job is marked as stalled. From src/classes/job.ts:703:
/**
 * Extend the lock for this job.
 *
 * @param token - unique token for the lock
 * @param duration - lock duration in milliseconds
 */
extendLock(token: string, duration: number): Promise<number> {
  return this.scripts.extendLock(this.id, token, duration);
}

Common Causes of Stalled Jobs

If a worker process crashes while processing a job:
// Handle uncaught errors to prevent crashes
process.on('uncaughtException', (error) => {
  console.error('Uncaught exception:', error);
  // Graceful shutdown
});

const worker = new Worker('Paint', async job => {
  try {
    await processJob(job);
  } catch (error) {
    // Handle errors properly
    throw error;
  }
});
Connection loss to Redis:
const worker = new Worker(
  'Paint',
  async job => {
    // Process job
  },
  {
    connection: {
      host: 'redis-server',
      port: 6379,
      retryStrategy: (times) => {
        return Math.min(times * 50, 2000);
      },
    },
  }
);
CPU-intensive synchronous operations:
import { Worker } from 'bullmq';
import { Worker as WorkerThreads } from 'worker_threads';

const worker = new Worker('Paint', async job => {
  // Offload CPU-intensive work to worker threads
  return new Promise((resolve, reject) => {
    const workerThread = new WorkerThreads('./cpu-intensive.js');
    workerThread.on('message', resolve);
    workerThread.on('error', reject);
    workerThread.postMessage(job.data);
  });
});
Out of memory errors:
// Monitor memory usage
const worker = new Worker('Paint', async job => {
  const memUsage = process.memoryUsage();
  if (memUsage.heapUsed > 1024 * 1024 * 1024) { // 1GB
    console.warn('High memory usage:', memUsage);
  }
  
  await processJob(job);
});

Best Practices

Monitor Stalls

Track stalled jobs as a key metric. Frequent stalling indicates a problem.

Graceful Shutdown

Implement proper shutdown procedures to finish processing before terminating.

Use Sandboxing

Isolate job processing in separate processes for better reliability.

Set Realistic Limits

Configure stalledInterval and maxStalledCount based on your job characteristics.

Graceful Shutdown Example

import { Worker } from 'bullmq';

const worker = new Worker('Paint', async job => {
  // Process job
});

const shutdown = async () => {
  console.log('Shutting down worker...');
  
  // Stop accepting new jobs
  await worker.close();
  
  console.log('Worker shut down gracefully');
  process.exit(0);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

Debugging Stalled Jobs

When investigating stalled jobs:
  1. Check worker logs: Look for crashes or errors
  2. Monitor Redis connection: Ensure stable connection
  3. Profile CPU usage: Identify blocking operations
  4. Review job data: Large payloads can cause issues
  5. Check external dependencies: Timeouts from external services
import { QueueEvents } from 'bullmq';

const queueEvents = new QueueEvents('Paint');

queueEvents.on('stalled', async ({ jobId }) => {
  const job = await Job.fromId(queue, jobId);
  
  console.log('Stalled job details:', {
    id: job.id,
    name: job.name,
    data: job.data,
    stalledCounter: job.stalledCounter,
    attemptsMade: job.attemptsMade,
    processedBy: job.processedBy,
  });
});

Read More

Worker Options

maxStalledCount API Reference

Manually Fetching Jobs

Pattern for manual stalled job checking

Sandboxed Processors

Learn about sandboxed processors

Queue Events

All available queue events

Build docs developers (and LLMs) love