Matching Service API

Overview

The Matching Service is an internal Cadence service that coordinates task distribution between workflow workers and the Cadence cluster. It manages task lists, handles long-polling from workers, and routes tasks efficiently. Service Location: service/matching/handler/interfaces.go

The Matching Service API is internal to Cadence. Applications should use the Frontend Service API instead.

Service Architecture

The Matching Service operates on a per-task-list basis:

Health Check

Health

Check the health status of the matching service.

Health(context.Context) (*types.HealthStatus, error)

boolean

Whether the service is healthy

Msg

string

Health status message (e.g., “matching good”)

Task Addition APIs

AddDecisionTask

Add a decision task to a task list.

AddDecisionTask(context.Context, *types.AddDecisionTaskRequest) (*types.AddDecisionTaskResponse, error)

DomainUUID

string

required

UUID of the domain

Execution

WorkflowExecution

required

Workflow execution that owns this task

TaskList

required

Task list to add the task to

ScheduleID

int64

required

Event ID of the decision task scheduled event

ScheduleToStartTimeoutSeconds

int32

Maximum time before task times out if not started

Source

TaskSource

Source of the task (History or DbBacklog)

ForwardedFrom

string

Address of the host that forwarded this task

PartitionConfig

map[string]string

Task list partition configuration

PartitionConfig

TaskListPartitionConfig

Current partition configuration after adding the task

Task Addition Flow:

Validates domain and task list
Applies rate limiting
Attempts sync match with waiting poller
If no poller, persists to task queue
Returns partition config for task list

Rate Limiting:

Worker RPS limit per domain
Global matching service RPS limit
Returns ServiceBusyError if throttled

AddActivityTask

Add an activity task to a task list.

AddActivityTask(context.Context, *types.AddActivityTaskRequest) (*types.AddActivityTaskResponse, error)

DomainUUID

string

required

UUID of the domain

Execution

WorkflowExecution

required

Workflow execution that owns this task

SourceDomainUUID

string

UUID of the source domain (for cross-domain activities)

TaskList

required

Task list to add the task to

ScheduleID

int64

required

Event ID of the activity task scheduled event

ScheduleToStartTimeoutSeconds

int32

Maximum time before task times out if not started

Source

TaskSource

Source of the task

ForwardedFrom

string

Address of forwarding host

ActivityTaskDispatchInfo

Additional dispatch information for the activity

PartitionConfig

map[string]string

Task list partition configuration

PartitionConfig

TaskListPartitionConfig

Current partition configuration

Task Polling APIs

PollForDecisionTask

Long poll for a decision task from a task list.

PollForDecisionTask(context.Context, *types.MatchingPollForDecisionTaskRequest) (*types.MatchingPollForDecisionTaskResponse, error)

DomainUUID

string

required

UUID of the domain

PollerID

string

required

Unique identifier for this poller

PollRequest

PollForDecisionTaskRequest

required

The poll request details

ForwardedFrom

string

Address of forwarding host for partition routing

TaskToken

[]byte

Opaque task token for completing the task

WorkflowExecution

Workflow execution for this task

WorkflowType

The workflow type

PreviousStartedEventId

int64

Event ID of the previous decision task

StartedEventId

int64

Event ID when this decision task started

Attempt

int64

Retry attempt number

BacklogCountHint

int64

Approximate number of tasks in backlog

History

Workflow execution history

NextPageToken

[]byte

Token for fetching additional history

Query

WorkflowQuery

Query to execute if present

Queries

map[string]WorkflowQuery

Multiple queries to execute

Long Polling Behavior:

Blocks until a task is available or context timeout
Returns empty response on timeout (not an error)
Validates context has appropriate timeout (1-90 seconds recommended)
Supports sync match for immediate task dispatch

Sticky Task Lists: If sticky execution is enabled:

Decision tasks preferentially routed to same worker
Reduces history loading overhead
Falls back to normal task list if sticky worker unavailable

PollForActivityTask

Long poll for an activity task from a task list.

PollForActivityTask(context.Context, *types.MatchingPollForActivityTaskRequest) (*types.MatchingPollForActivityTaskResponse, error)

DomainUUID

string

required

UUID of the domain

PollerID

string

required

Unique identifier for this poller

PollRequest

PollForActivityTaskRequest

required

The poll request details

ForwardedFrom

string

Address of forwarding host

TaskToken

[]byte

Opaque task token for completing the task

WorkflowExecution

Workflow execution for this activity

ActivityId

string

Activity ID

ActivityType

The activity type to execute

Input

[]byte

Serialized activity input

ScheduledTimestamp

int64

When the activity was scheduled

StartedTimestamp

int64

When the activity started

ScheduleToCloseTimeoutSeconds

int32

Total timeout from schedule to completion

StartToCloseTimeoutSeconds

int32

Timeout from start to completion

HeartbeatTimeoutSeconds

int32

Maximum time between heartbeats

Attempt

int32

Retry attempt number

ScheduledTimestampOfThisAttempt

int64

When this retry attempt was scheduled

HeartbeatDetails

[]byte

Details from last heartbeat

WorkflowType

Parent workflow type

WorkflowDomain

string

Parent workflow domain name

Header

Context propagation headers

Query APIs

QueryWorkflow

Query a workflow execution through the matching service.

QueryWorkflow(context.Context, *types.MatchingQueryWorkflowRequest) (*types.MatchingQueryWorkflowResponse, error)

DomainUUID

string

required

UUID of the domain

TaskList

required

Task list where the query should be sent

QueryRequest

QueryWorkflowRequest

required

The query request

ForwardedFrom

string

Address of forwarding host

QueryResult

[]byte

Serialized query result

QueryRejected

Information if query was rejected

Query Routing:

Matches queries to pending decision tasks
Forwards to appropriate partition if partitioned
Returns query result synchronously
Supports query reject conditions

RespondQueryTaskCompleted

Respond to a query task.

RespondQueryTaskCompleted(context.Context, *types.MatchingRespondQueryTaskCompletedRequest) error

DomainUUID

string

required

UUID of the domain

TaskList

required

Task list for the query

TaskID

string

required

Query task ID

CompletedRequest

RespondQueryTaskCompletedRequest

required

Query completion details

Task List Management APIs

DescribeTaskList

Get information about a task list.

DescribeTaskList(context.Context, *types.MatchingDescribeTaskListRequest) (*types.DescribeTaskListResponse, error)

DomainUUID

string

required

UUID of the domain

DescRequest

DescribeTaskListRequest

required

Description request parameters

Pollers

[]PollerInfo

List of currently active pollers

TaskListStatus

Status information if requested

Show properties

BacklogCountHint

int64

Approximate backlog size

ReadLevel

int64

Current read position in queue

AckLevel

int64

Last acknowledged task ID

RatePerSecond

double

Task dispatch rate

TaskIDBlock

Current task ID allocation block

PollerInfo includes:

Identity: Worker identity string
LastAccessTime: Last poll timestamp
RatePerSecond: Poll rate from this worker

ListTaskListPartitions

List partitions for a task list.

ListTaskListPartitions(context.Context, *types.MatchingListTaskListPartitionsRequest) (*types.ListTaskListPartitionsResponse, error)

Domain

string

required

Domain name

TaskList

required

Task list to query

ActivityTaskListPartitions

[]TaskListPartitionMetadata

Activity task list partitions

DecisionTaskListPartitions

[]TaskListPartitionMetadata

Decision task list partitions

Partition Metadata:

Key: Partition identifier
OwnerHostName: Host owning this partition

GetTaskListsByDomain

Retrieve all task lists in a domain.

GetTaskListsByDomain(context.Context, *types.GetTaskListsByDomainRequest) (*types.GetTaskListsByDomainResponse, error)

Domain

string

required

Domain name to query

DecisionTaskListMap

map[string]TaskListStatus

Map of decision task list names to their status

ActivityTaskListMap

map[string]TaskListStatus

Map of activity task list names to their status

UpdateTaskListPartitionConfig

Update partition configuration for a task list.

UpdateTaskListPartitionConfig(context.Context, *types.MatchingUpdateTaskListPartitionConfigRequest) (*types.MatchingUpdateTaskListPartitionConfigResponse, error)

DomainUUID

string

required

UUID of the domain

TaskList

required

Task list to update

TaskListType

required

Type (Decision or Activity)

PartitionConfig

TaskListPartitionConfig

required

New partition configuration

RefreshTaskListPartitionConfig

Refresh partition configuration from persistence.

RefreshTaskListPartitionConfig(context.Context, *types.MatchingRefreshTaskListPartitionConfigRequest) (*types.MatchingRefreshTaskListPartitionConfigResponse, error)

Poller Management APIs

CancelOutstandingPoll

Cancel an outstanding poll request.

CancelOutstandingPoll(context.Context, *types.CancelOutstandingPollRequest) error

DomainUUID

string

required

UUID of the domain

TaskListType

int32

required

Type of task list (0=Decision, 1=Activity)

TaskList

required

Task list being polled

PollerID

string

required

ID of the poller to cancel

Use Cases:

Worker shutdown
Connection errors
Task list reassignment

Task List Partitioning

Overview

Task list partitioning allows horizontal scaling of high-throughput task lists:

Partition Configuration

type TaskListPartitionConfig struct {
    Version              int32
    NumReadPartitions    int32
    NumWritePartitions   int32
}

NumReadPartitions: Number of partitions for polling
NumWritePartitions: Number of partitions for task addition
Version: Configuration version for consistency

Partition Routing

Tasks are routed to partitions using:

Workflow ID hash: Ensures tasks from same workflow go to same partition
Round-robin: For non-workflow-specific tasks
Isolation groups: For tenant isolation

Dynamic Repartitioning

Update partition count via UpdateTaskListPartitionConfig
New tasks routed to new partition count
Existing pollers gradually migrate
Old partitions drain automatically

Rate Limiting

The Matching Service implements multiple rate limiting strategies:

Worker Rate Limiting

workerRateLimiter quotas.Policy

Applied to:

AddDecisionTask
AddActivityTask
PollForDecisionTask
PollForActivityTask

Configuration:

matching.workerRPS: Global worker RPS
matching.domainWorkerRPS: Per-domain worker RPS

User Rate Limiting

userRateLimiter quotas.Policy

Applied to:

QueryWorkflow
DescribeTaskList
ListTaskListPartitions
GetTaskListsByDomain

Configuration:

matching.userRPS: Global user RPS
matching.domainUserRPS: Per-domain user RPS

Rate Limit Errors

When rate limited, returns:

errMatchingHostThrottle = &types.ServiceBusyError{
    Message: "Matching host rps exceeded",
}

Clients should implement exponential backoff.

Task Synchronization

Sync Match

When a task is added and pollers are waiting:

Task immediately dispatched to poller
No persistence overhead
Lowest possible latency (< 1ms typical)

Async Match

When no pollers are waiting:

Task persisted to task queue
Next poller retrieves from queue
Higher latency but guarantees delivery

Backlog Management

Task queues maintain:

Read Level: Current read position
Ack Level: Last acknowledged task
Backlog Count: Approximate pending tasks
Task ID Blocks: Pre-allocated ID ranges

Performance Optimization

Local Dispatch

Tasks preferentially dispatched to pollers on same host:

Reduces network round-trips
Improves cache locality
Lowers tail latency

Task Batching

Multiple tasks can be batched for persistence:

Reduces database write load
Improves throughput
Slight increase in latency

Poller Management

Active poller tracking:

Maintains poller registry
Monitors poller health
Routes tasks to healthy pollers
Expires stale pollers

Isolation Groups

Isolation groups provide task routing based on worker capabilities:

type TaskListMetadata struct {
    IsolationGroups []string
}

Tasks can be routed to specific worker pools for:

GPU workers
Compliance zones
Resource-specific workers
Tenant isolation

Monitoring & Metrics

Key Metrics

matching.tasks.sync-match: Sync match success rate
matching.tasks.backlog: Task backlog size
matching.poll.latency: Poll latency distribution
matching.poll.timeouts: Poll timeout rate
matching.tasks.expired: Task expiration rate
matching.pollers.count: Active poller count

Health Indicators

High sync match rate (>90%): Healthy
Growing backlog: Need more workers or partitions
High poll timeouts: Need more tasks or reduce pollers
Task expirations: Increase timeouts or add workers

Error Handling

ServiceBusyError

Rate limiting or resource constraints:

&types.ServiceBusyError{Message: "Matching host rps exceeded"}

Recovery: Exponential backoff, increase capacity

EntityNotExistsError

Task list or domain not found:

&types.EntityNotExistsError{Message: "..."}

Recovery: Verify domain/task list name

StickyWorkerUnavailableError

Sticky worker not polling:

&types.StickyWorkerUnavailableError{}

Recovery: Automatic fallback to normal task list

Best Practices

Poll Timeout Configuration

Set context timeout: 60-90 seconds
Shorter timeouts increase overhead
Longer timeouts delay shutdown

Task List Design

One task list per use case
Avoid sharing task lists across workflows
Use partitioning for high throughput
Consider isolation groups for specialized workers

Poller Management

Maintain stable poller count
Gracefully shut down pollers
Monitor backlog and adjust capacity
Use sticky execution for decision tasks

Error Handling

Implement retry logic for transient errors
Monitor rate limiting errors
Handle task expiration gracefully
Log poller connectivity issues

Service APIs

Data Types

​Overview

​Service Architecture

​Health Check

​Health

​Task Addition APIs

​AddDecisionTask

​AddActivityTask

​Task Polling APIs

​PollForDecisionTask

​PollForActivityTask

​Query APIs

​QueryWorkflow

​RespondQueryTaskCompleted

​Task List Management APIs

​DescribeTaskList

​ListTaskListPartitions

​GetTaskListsByDomain

​UpdateTaskListPartitionConfig

​RefreshTaskListPartitionConfig

​Poller Management APIs

​CancelOutstandingPoll

​Task List Partitioning

​Overview

​Partition Configuration

​Partition Routing

​Dynamic Repartitioning

​Rate Limiting

​Worker Rate Limiting

​User Rate Limiting

​Rate Limit Errors

​Task Synchronization

​Sync Match

​Async Match

​Backlog Management

​Performance Optimization

​Local Dispatch

​Task Batching

​Poller Management

​Isolation Groups

​Monitoring & Metrics

​Key Metrics

​Health Indicators

​Error Handling

​ServiceBusyError

​EntityNotExistsError

​StickyWorkerUnavailableError

​Best Practices

​Poll Timeout Configuration

​Task List Design

​Poller Management

​Error Handling

​See Also

Build docs developers (and LLMs) love

Overview

Service Architecture

Health Check

Health

Task Addition APIs

AddDecisionTask

AddActivityTask

Task Polling APIs

PollForDecisionTask

PollForActivityTask

Query APIs

QueryWorkflow

RespondQueryTaskCompleted

Task List Management APIs

DescribeTaskList

ListTaskListPartitions

GetTaskListsByDomain

UpdateTaskListPartitionConfig

RefreshTaskListPartitionConfig

Poller Management APIs

CancelOutstandingPoll

Task List Partitioning

Overview

Partition Configuration

Partition Routing

Dynamic Repartitioning

Rate Limiting

Worker Rate Limiting

User Rate Limiting

Rate Limit Errors

Task Synchronization

Sync Match

Async Match

Backlog Management

Performance Optimization

Local Dispatch

Task Batching

Poller Management

Isolation Groups

Monitoring & Metrics

Key Metrics

Health Indicators

Error Handling

ServiceBusyError

EntityNotExistsError

StickyWorkerUnavailableError

Best Practices

Poll Timeout Configuration

Task List Design

Poller Management

Error Handling

See Also