Job Lifecycle
From the moment a producer calls theadd method on a Queue instance, a job enters a lifecycle where it transitions through different states until its completion or failure.
Queue-based Job Lifecycle
When a job is added to a queue usingqueue.add(), it can be in one of three initial states:
All jobs enter a waiting list before they can be processed. This is the default state for new jobs.
const queue = new Queue('tasks');
// Job enters 'wait' state
await queue.add('process-data', { data: 'value' });
Jobs with a priority value are placed in a prioritized set where higher priority jobs (lower priority number) are processed first.
// Higher priority job (processed first)
await queue.add('urgent-task', { data: 'urgent' }, { priority: 1 });
// Lower priority job (processed later)
await queue.add('normal-task', { data: 'normal' }, { priority: 10 });
Priorities range from
0 to 2^21, where 0 is the highest priority. This follows Unix process priority standards where a higher number means less priority.Jobs with a delay are placed in a delayed set and wait for their timeout before being promoted to the waiting list or prioritized set.
Active State
Once a worker picks up a job, it enters the active state. The job remains active while the worker’s process function executes.Final States
Jobs end in one of two final states:- Completed - Job processed successfully and returned a value
- Failed - Job threw an exception during processing
Flow Producer Job Lifecycle
When jobs are added via a FlowProducer (for parent-child dependencies), there’s an additional state:Waiting-Children State
Jobs that have children enter the waiting-children state. These jobs wait for all their children to complete before being processed.Redis Data Structures
BullMQ leverages Redis data structures for efficient job management:Lists
- Wait list - FIFO queue of jobs ready to be processed
- Used for standard job ordering
Sorted Sets
- Delayed set - Jobs sorted by timestamp, promoted when delay expires
- Prioritized set - Jobs sorted by priority value
- Active set - Currently processing jobs with timestamps for stall detection
Hashes
- Job data - Each job’s data, options, and state stored in a hash
- Queue metadata - Queue configuration and statistics
Keys
BullMQ uses Redis key prefixes to organize data:Atomic Operations
BullMQ uses Redis Lua scripts to ensure atomic operations:- Adding jobs - Atomically adds job data and enqueues it
- Moving jobs - Atomically moves jobs between states
- Processing jobs - Atomically claims jobs for processing
- Completing jobs - Atomically marks completion and handles dependencies
- No race conditions between multiple workers
- Exactly-once processing semantics (in the best case)
- Consistent state even with crashes
Stalled Jobs
BullMQ automatically detects and recovers stalled jobs:Workers periodically check for jobs in the active state that haven’t been updated within the stall timeout.
const worker = new Worker('tasks', async (job) => {
// Process job
}, {
stalledInterval: 30000, // Check every 30 seconds
maxStalledCount: 1, // Max times a job can be stalled
});
Stalled jobs are automatically moved back to the wait state to be processed again by another worker.
Connection Architecture
Each BullMQ class requires Redis connections:Queue
- Uses 1 connection for adding jobs and management operations
- Connection can be reused across multiple Queue instances
Worker
- Uses 2 connections:
- One for blocking operations (BZPOPMIN)
- One for job processing and management
- Connection can be reused, but worker creates internal blocking connection
Workers require
maxRetriesPerRequest: null to ensure they keep retrying failed commands indefinitely and don’t stop processing on temporary Redis connection issues.QueueEvents
- Uses 1 blocking connection for listening to events
- Cannot reuse connections (requires dedicated blocking connection)
FlowProducer
- Uses 1 connection for adding job flows
- Connection can be reused
Scaling Architecture
BullMQ is designed for horizontal scalability:Multiple Workers
Add more workers to increase throughput:Multiple Queues
Separate concerns with multiple queues:Redis Cluster
For very high throughput, use Redis Cluster:Polling-Free Design
Unlike many job queue systems, BullMQ uses a polling-free design for maximum efficiency:- Workers use Redis’s blocking
BZPOPMINcommand to wait for jobs - No CPU waste checking for new jobs
- Instant job processing as soon as jobs are added
- Minimal latency between job addition and processing
Performance Characteristics
Throughput
- Single Redis instance: 10,000+ jobs/second
- With Dragonfly: 100,000+ jobs/second
- Limited primarily by Redis performance and network latency
Latency
- Job addition to processing: < 1ms (local Redis)
- Job addition to processing: < 10ms (remote Redis)
- Minimal overhead from BullMQ itself
Memory
- Job data stored in Redis with configurable retention
- Completed jobs can be automatically removed
- Failed jobs can be kept for debugging
Next Steps
Workers
Deep dive into worker configuration and features
Jobs
Learn about job options and lifecycle
Flows
Create complex job dependencies
Going to Production
Best practices for production deployments
