From BullMQ 2.0 onwards, the
QueueScheduler is not needed anymore. For manually checking stalled jobs, see the manually fetching jobs pattern.What is a Stalled Job?
When a job is in an active state (being processed by a worker), it needs to continuously update the queue to notify that the worker is still working on it. This mechanism prevents a worker that crashes or enters an endless loop from keeping a job in an active state forever. When a worker fails to notify the queue that it’s still working on a job, that job is moved back to the waiting list or to the failed set. We say the job has stalled, and the queue emits astalled event.
From src/classes/job.ts:118:
There is no “stalled” state - only a
stalled event emitted when a job is automatically moved from active to waiting state.Stalled Job Limit
If a job stalls more than a predefined limit (see themaxStalledCount option), the job will be failed permanently with the error “job stalled more than allowable limit”.
Preventing Stalled Jobs
To avoid stalled jobs:1. Avoid CPU-Intensive Operations
Make sure your worker doesn’t keep the Node.js event loop too busy. The default max stalled check duration is 30 seconds.2. Use Sandboxed Processors
Workers can spawn separate Node.js processes, running independently from the main process:- Isolation: Crashes don’t affect the main process
- CPU utilization: Better use of multi-core systems
- Memory safety: Memory leaks are contained
3. Increase Stall Check Interval
If your jobs legitimately take a long time to process:Listening to Stalled Events
Monitor when jobs stall:Worker Lock Extension
Workers automatically extend locks on jobs they’re processing. If a worker crashes or loses connection, the lock expires and the job is marked as stalled. Fromsrc/classes/job.ts:703:
Common Causes of Stalled Jobs
Worker Crashes
Worker Crashes
If a worker process crashes while processing a job:
Network Issues
Network Issues
Connection loss to Redis:
Blocked Event Loop
Blocked Event Loop
CPU-intensive synchronous operations:
Memory Pressure
Memory Pressure
Out of memory errors:
Best Practices
Monitor Stalls
Track stalled jobs as a key metric. Frequent stalling indicates a problem.
Graceful Shutdown
Implement proper shutdown procedures to finish processing before terminating.
Use Sandboxing
Isolate job processing in separate processes for better reliability.
Set Realistic Limits
Configure
stalledInterval and maxStalledCount based on your job characteristics.Graceful Shutdown Example
Debugging Stalled Jobs
When investigating stalled jobs:- Check worker logs: Look for crashes or errors
- Monitor Redis connection: Ensure stable connection
- Profile CPU usage: Identify blocking operations
- Review job data: Large payloads can cause issues
- Check external dependencies: Timeouts from external services
Read More
Worker Options
maxStalledCount API Reference
Manually Fetching Jobs
Pattern for manual stalled job checking
Sandboxed Processors
Learn about sandboxed processors
Queue Events
All available queue events
