Skip to main content

Task Queues

Task queues are the routing mechanism that connects the Temporal Server to your workers. They act as a bridge between workflow executions in the History Service and worker processes that execute workflow and activity code.

Task Queue Basics

A task queue is a logical queue that workers poll for tasks. When the History Service needs workflow or activity code to execute, it dispatches tasks to the appropriate task queue.

Two Types of Tasks

Workflow Tasks

Prompts worker to advance workflow execution by replaying history and generating new commands.

Activity Tasks

Requests worker to execute specific activity function with given input.
These “task queue tasks” are user-visible concepts, distinct from internal History Service tasks (Transfer Tasks, Timer Tasks) which are implementation details.

Task Queue Architecture

The Matching Service manages all task queues:

Task Queue Partitioning

To handle high throughput, each task queue is split into multiple partitions:
// From service/matching/task_queue_partition_manager.go
type taskQueuePartitionManagerImpl struct {
    engine    *matchingEngineImpl
    partition tqid.Partition  // Which partition this manages
    ns        *namespace.Namespace
    config    *taskQueueConfig
    
    // Versioned queues for worker versioning
    versionedQueues     map[PhysicalTaskQueueVersion]physicalTaskQueueManager
    versionedQueuesLock sync.RWMutex
    
    userDataManager     userDataManager
    rateLimitManager    *rateLimitManager
    
    // Default queue future for initialization
    defaultQueueFuture *future.FutureImpl[physicalTaskQueueManager]
}
Scalability: Single queue partition limited by one Matching Service instance. Partitions spread load across instances.Throughput: Each partition handles subset of tasks. More partitions = higher total throughput.Default: 4 partitions per task queue, configurable per namespace.Trade-offs: More partitions increase dispatching overhead. Optimal partition count depends on task rate.

Task Dispatching

When History Service schedules a task, it goes through this flow:
1

History creates Transfer Task

Workflow/Activity task decision creates a Transfer Task in History shard’s queue
2

Queue Processor picks up task

Background queue processor reads Transfer Task from persistence
3

Call Matching Service

History makes AddWorkflowTask or AddActivityTask RPC to Matching Service
4

Matching routes to partition

Matching Service hashes task queue name to select partition
5

Attempt sync match

If worker is polling, task delivered immediately (sync match)
6

Or write to database

If no poller available, task written to persistence (async match)
// From service/history/transfer_queue_active_task_executor.go
func (t *transferQueueActiveTaskExecutor) processWorkflowTask(
    ctx context.Context,
    task *tasks.WorkflowTask,
) error {
    // Load workflow state
    // Build task info
    // Call Matching Service
    return t.matchingClient.AddWorkflowTask(ctx, &matchingservice.AddWorkflowTaskRequest{
        NamespaceId:       task.NamespaceID,
        Execution:         task.WorkflowExecution,
        TaskQueue:         &taskqueuepb.TaskQueue{Name: taskQueueName},
        ScheduledEventId:  task.ScheduledEventID,
        ScheduleToStartTimeout: timeout,
    })
}

Sync Match vs Async Match

The Matching Service optimizes for low latency through sync matching:

Sync Match (Fast Path)

1

Worker is already polling

Worker has open long-poll connection waiting for task
2

Task arrives at Matching

History Service dispatches task to Matching Service
3

Instant match

Task delivered directly to waiting poller without database write
4

Low latency

Total time from History to Worker: 1-5ms typically

Async Match (Slow Path)

1

No poller available

Task arrives but no worker is currently polling
2

Write to database

Task persisted to Matching Service database
3

Worker polls later

When worker polls, task read from database and returned
4

Higher latency

Database round-trip adds latency, but ensures task not lost
// From service/matching/matcher.go
func (tm *TaskMatcher) Offer(ctx context.Context, task *internalTask) (bool, error) {
    // Check if we should prefer database due to backlog
    if !tm.isBacklogNegligible() {
        return false, nil  // Force async match for ordering
    }
    
    // Try sync match with waiting poller
    select {
    case tm.taskC <- task:  // Poller waiting!
        if task.responseC != nil {
            err := <-task.responseC  // Wait for poller to accept
            return true, err.startErr
        }
        return false, nil
    default:
        // No poller, try forwarding to parent partition
        return tm.fwdr.ForwardTask(ctx, task)
    }
}

Task Queue Forwarding

To improve sync match rate, Matching uses hierarchical forwarding:

Forwarding Mechanism

Forward Task to Parent: Child partition has task but no poller → forwards task to parent/rootForward Poller to Parent: Child partition has poller but no task → forwards poller to parent/rootMatch at Root: Root partition has better chance of matching tasks with pollers from all childrenBenefits: Higher sync match rate, better load distribution, faster task deliveryCost: Additional RPC hops, slightly higher latency for forwarded tasks
// From service/matching/forwarder.go
type Forwarder struct {
    cfg       *forwarderConfig
    partition tqid.Partition
    client    matchingservice.MatchingServiceClient
    
    // Token bucket for rate limiting forwarding
    addReqTokenBucket atomic.Pointer[dynamicconfig.TaskqueueQuotaBucket]
    pollReqTokenBucket atomic.Pointer[dynamicconfig.TaskqueueQuotaBucket]
}

func (fwdr *Forwarder) ForwardTask(ctx context.Context, task *internalTask) error {
    // Get parent partition
    parent := fwdr.partition.Parent()
    
    // Forward to parent's Matching instance
    return fwdr.client.AddWorkflowTask(ctx, &matchingservice.AddWorkflowTaskRequest{
        // ... task details
        ForwardInfo: &matchingservice.ForwardInfo{
            SourcePartition: fwdr.partition.PartitionId(),
        },
    })
}

Task Queue Lifecycle

Loading

Task queue partitions are loaded on-demand:
  1. First poll or task for a task queue triggers loading
  2. Partition manager created for that partition
  3. State loaded from database (backlog, ack levels, etc.)
  4. Matcher started to match tasks with pollers

Unloading

Partitions unload when idle:
  • No pollers for configured duration (e.g., 5 minutes)
  • No tasks dispatched
  • Memory pressure on Matching Service
const (
    unloadCauseIdle = "idle"           // No activity
    unloadCauseConfigChange = "config"  // Dynamic config changed
    unloadCauseError = "error"         // Persistent errors
)
Unloading is an optimization. When next task/poll arrives, partition reloads automatically. Users don’t see interruption.

Task Queue Backlog

When task arrival rate exceeds worker processing rate, backlog accumulates:

Backlog Management

Tasks that can’t sync match are written to persistence:Cassandra: Separate table per task queue partition SQL: Rows in tasks table with task queue + partition columnsBacklog is FIFO ordered by task ID (monotonically increasing).
// From service/matching/task_reader.go
type taskReader struct {
    tlMgr           taskQueuePartitionManager
    db              *taskQueueDB
    matcher         *TaskMatcher
    
    // Read batches of tasks from database
    taskBatchSize   int
}

// Continuously read backlog tasks and offer to matcher
func (tr *taskReader) dispatchBufferedTasks() {
    for !tr.stopped() {
        tasks := tr.db.ReadTasks(batchSize)  // Read from persistence
        for _, task := range tasks {
            tr.matcher.Offer(task)  // Try to dispatch
        }
    }
}
Key metrics:
  • Task Queue Backlog Age: How old is the oldest task?
  • Task Queue Backlog Size: How many tasks waiting?
  • Task Dispatch Latency: Time from scheduling to worker pickup

Task Queue Versioning

For worker versioning, each task queue has multiple physical queues:
// From service/matching/task_queue_partition_manager.go
type taskQueuePartitionManagerImpl struct {
    // One versioned queue per Build ID
    versionedQueues map[PhysicalTaskQueueVersion]physicalTaskQueueManager
    
    // User data manager tracks versioning rules
    userDataManager userDataManager
}

Default Queue

Unversioned tasks and workers without Build ID use default queue

Versioned Queues

Each Build ID gets dedicated queue. Tasks routed by versioning rules.

Task Queue Configuration

Dynamic Configuration

type taskQueueConfig struct {
    // How many partitions for this task queue
    NumReadPartitions int
    NumWritePartitions int
    
    // Task lifecycle
    MaxTaskQueueIdleTime time.Duration  // When to unload
    MaxTaskBatchSize int                // Backlog read batch size
    
    // Rate limiting
    TaskDispatchRPS float64
    
    // Forwarding
    ForwarderMaxChildrenPerNode int
    ForwarderMaxOutstandingPolls int
    ForwarderMaxOutstandingTasks int
}

Rate Limiting

Task queues support rate limiting task dispatch:
  • Global RPS limit per task queue
  • Smoothed delivery prevents spikes
  • Shared across partitions
  • Configurable per namespace
Rate limiting applies to task dispatch from Matching to workers, not task arrival from History Service. Tasks buffer in Matching if rate limited.

Sticky Task Queues

Workers can request “sticky” execution for performance:
  • Normal task queue: my-task-queue
  • Sticky task queue: __sticky__:${workerID}
Sticky queues:
  • Route workflow tasks back to same worker
  • Enable workflow cache reuse (skip replay)
  • Timeout quickly and fallback to normal queue
  • Per-worker, not shared across workers

Task Queue Observability

Key signals to monitor:

Metrics

  • task_queue_backlog_age_seconds: Age of oldest task
  • task_queue_backlog_size: Number of pending tasks
  • poll_success_rate: Poller success rate
  • sync_match_rate: Percentage of sync vs async matches
  • task_dispatch_latency: Time from schedule to worker

Troubleshooting

High Backlog Age

Cause: Not enough workers or workers too slowSolution: Scale workers, optimize activity code, increase concurrency

Low Sync Match Rate

Cause: Tasks arriving when no pollers waitingSolution: Increase poller count, reduce partition count, enable forwarding

High Task Latency

Cause: Network issues, Matching overload, database slownessSolution: Check Matching Service health, database performance, network latency

No Recent Poller

Cause: All workers down or misconfiguredSolution: Verify workers running, check task queue name matches, review logs

Build docs developers (and LLMs) love