Overview
Domains are the primary namespace and isolation boundary in Cadence. Domain types define configuration, replication settings, and metadata for multi-tenant workflow execution. Source Location:common/types/shared.go
Core Domain Types
DomainInfo
Core domain information.Domain name (globally unique identifier)
Current status (Registered, Deprecated, Deleted)
Human-readable domain description
Contact email for domain owner
Arbitrary key-value metadata
System-generated unique identifier
DomainConfiguration
Domain configuration settings.Number of days to retain closed workflow data (minimum: 1, maximum: domain-specific)
Whether to emit domain-specific metrics
Blocked worker binary checksums
Whether history archival is enabled (Disabled, Enabled)
URI for history archival storage (e.g., “s3://my-bucket/history”)
Whether visibility archival is enabled
URI for visibility archival storage
Configuration for task isolation groups
Configuration for async workflow execution
DomainReplicationConfiguration
Cross-cluster replication configuration.Name of the currently active cluster for this domain
List of clusters where domain is replicated
ClusterReplicationConfiguration
Per-cluster replication settings.Name of the cluster in the Cadence installation
Domain Status
DomainStatus
Enumeration of domain lifecycle states.Domain is active and accepting workflows
Domain is deprecated; new workflows cannot start but existing workflows continue
Domain is deleted; no operations allowed
Domain API Request Types
RegisterDomainRequest
Request to register a new domain.Unique domain name
Domain description
Owner contact email
Retention period for closed workflows (1-30 days typical)
Enable domain-specific metrics
Clusters for global domain replication
Active cluster for global domain
Custom metadata key-value pairs
Authorization token
Whether this is a global (multi-cluster) domain
Enable history archival
Storage URI for archived history
Enable visibility archival
Storage URI for archived visibility data
DescribeDomainRequest
Request to describe a domain.Domain name (either Name or UUID required)
Domain UUID (either Name or UUID required)
DescribeDomainResponse
Response containing domain details.Core domain information
Domain configuration settings
Replication settings for global domains
Current failover version for conflict resolution
Whether this is a global domain
UpdateDomainRequest
Request to update domain configuration.Domain name to update
Updated description
Updated owner email
Updated metadata (merges with existing)
Updated retention period
Add blocked worker binaries
Remove a blocked binary by checksum
Change active cluster (triggers failover)
Timeout for graceful failover
UpdateDomainResponse
Response after updating domain.DeprecateDomainRequest
Request to deprecate a domain.Domain name to deprecate
Authorization token
DeleteDomainRequest
Request to permanently delete a domain.Domain name to delete
ListDomainsRequest
Request to list domains.Number of domains per page (default: 100)
Pagination token from previous response
ListDomainsResponse
Response with list of domains.List of domain descriptions
Token for next page, or empty if no more results
Domain Failover Types
FailoverDomainRequest
Request to failover a global domain to another cluster.Global domain name
Target cluster to become active
Graceful failover timeout (default: 0 for immediate failover)
- Validates target cluster exists in domain configuration
- Increments failover version
- Updates active cluster name
- Replicates failover markers to all clusters
- Drains in-flight operations if graceful timeout specified
FailoverDomainResponse
Response after domain failover.New failover version after successful failover
ListFailoverHistoryRequest
Request failover history for a domain.Domain name
ListFailoverHistoryResponse
Response with failover history.FailoverEvent
Single failover event.Failover version
When failover occurred (nanoseconds)
Previous active cluster
New active cluster
Type of failover (Manual, Automatic)
Bad Binary Management
BadBinaries
Collection of blocked worker binaries.Map of binary checksum to info
BadBinaryInfo
Information about a blocked binary.Why this binary is blocked
Who blocked this binary
When binary was blocked (nanoseconds)
- Block buggy deployments
- Prevent incompatible worker versions
- Enforce compliance requirements
Archival Types
ArchivalStatus
Archival enabled/disabled status.Archival is disabled
Archival is enabled
History Archival
Archives complete workflow event history:- Long-term storage for compliance
- Retention beyond standard retention period
- Search and retrieval capabilities
s3://bucket/prefixgs://bucket/prefixfile:///path/to/directory
Visibility Archival
Archives workflow metadata for search:- Workflow ID, type, status
- Start/close times
- Search attributes
Isolation Groups
IsolationGroupConfiguration
Configuration for task routing isolation.IsolationGroupPartition
Single isolation group definition.Isolation group name
Group state (Healthy, Drained)
- Zone-based routing (us-east-1a, us-east-1b)
- Capability-based routing (gpu-workers, cpu-workers)
- Tenant isolation
- Blue/green deployments
Async Workflow Configuration
AsyncWorkflowConfiguration
Configuration for async workflow execution.Whether async execution is enabled
Name of predefined async queue
Type of queue (kafka, sqs, etc.)
Queue-specific configuration
Domain Errors
DomainAlreadyExistsError
Attempt to register a domain that already exists.DomainNotActiveError
Operation on a domain that’s not active in current cluster.Cluster where request was received
Cluster where domain is currently active
EntityNotExistsError
Domain does not exist.Best Practices
Domain Design
-
Isolation Boundaries
- One domain per application/team
- Use domains for security isolation
- Consider compliance requirements
-
Naming Conventions
- Use descriptive names:
production-orders,staging-fulfillment - Include environment:
dev-,staging-,prod- - Keep names URL-safe (no spaces or special characters)
- Use descriptive names:
-
Retention Periods
- Balance cost vs. compliance needs
- Consider archival for long-term retention
- Minimum 1 day, typical 7-30 days
Global Domains
-
Multi-Cluster Setup
- Deploy at least 2 clusters for redundancy
- Configure all clusters before registering domain
- Test failover procedures regularly
-
Failover Strategy
- Use graceful failover for planned maintenance
- Monitor failover version for consistency
- Document runbooks for emergency failover
-
Replication Lag
- Monitor cross-cluster replication lag
- Consider lag when failing over
- Use strong consistency reads when needed
Bad Binary Management
-
Blocking Strategy
- Block at domain level, not globally
- Document reason and operator
- Set up alerts for blocked binary usage attempts
-
Unblocking
- Use
DeleteBadBinaryinUpdateDomainRequest - Verify fix before unblocking
- Communicate with affected teams
- Use
Archival
-
Storage Planning
- Estimate storage based on workflow count and history size
- Configure lifecycle policies on storage
- Monitor archival lag and failures
-
URI Configuration
- Use separate buckets for history and visibility
- Include environment in path:
s3://cadence/prod/domain-name/ - Ensure proper IAM permissions
Isolation Groups
-
Group Design
- Align with infrastructure zones
- Keep group count manageable (< 10 per domain)
- Use consistent naming across domains
-
Draining
- Drain groups before maintenance
- Monitor backlog during drain
- Verify no workers polling before decommission
Monitoring & Operations
Key Metrics
domain.registration.count: Number of domainsdomain.failover.latency: Failover durationdomain.replication.lag: Cross-cluster lagdomain.archival.failures: Archival error ratedomain.bad.binary.usage: Blocked binary attempts
Health Checks
- Verify domain status is Registered
- Check active cluster matches expected
- Monitor retention period enforcement
- Validate archival is working if enabled
Troubleshooting
DomainNotActiveError:- Check current active cluster
- Verify cluster connectivity
- Consider failover if needed
- Check storage permissions
- Verify URI is accessible
- Monitor archival worker logs
- Check network connectivity between clusters
- Verify replication task processing
- Scale replication workers if needed
See Also
- Frontend Service API - Domain management APIs
- Workflow Types - Workflow execution types
- Common Types - Shared type definitions