xCluster replication enables high-throughput asynchronous physical replication between independent YugabyteDB universes. It provides disaster recovery, multi-region active-active deployments, and low-latency writes without requiring synchronous consensus across regions.
Architecture
xCluster replicates Write-Ahead Log (WAL) records from source universe tablets to target universe tablets using a polling-based architecture:
- Source Universe: Produces WAL records as transactions commit
- Target Universe: Contains pollers that request WAL changes from source tablets
- CDC Streams: Track replication progress using the
cdc_state table
- Pollers: Independent processes in the target universe that poll source tablets
Replication Flow
Source Universe Target Universe
┌──────────────┐ ┌──────────────┐
│ Tablet 1 │◄─────poll────────│ Poller 1 │
│ (Leader) │──────changes────►│ │
│ WAL: OpId N │ │ Writes to │
└──────────────┘ │ Tablet(s) │
└──────────────┘
│ │
│ │
┌────▼─────┐ ┌─────▼────┐
│cdc_state │ │checkpoint│
│ table │ │tracking │
└──────────┘ └──────────┘
- Target pollers request changes starting from the last confirmed OpId
- Source tablets return batches of WAL records
- Pollers apply changes to target tablets with preserved commit times
- Progress is checkpointed in
cdc_state on the source
Replication Modes
Non-Transactional Replication
Use Case: Active-active multi-master deployments, low-latency reads
Characteristics:
- Writes allowed on both universes
- Last-writer-wins conflict resolution (by hybrid timestamp)
- Eventual consistency; may serve stale or torn reads
- Supports bidirectional replication
Limitations:
- Torn transactions possible during failover
- Index corruption risk with concurrent writes to same row
- Foreign key and unique constraints not guaranteed
- No automatic conflict resolution
# Non-transactional setup
Replication Type: Bidirectional
Write Mode: Multi-master
Consistency: Eventual
Conflict Resolution: Last Writer Wins
Transactional Replication
Use Case: Disaster recovery, active-standby deployments, consistent read replicas
Characteristics:
- Target universe is read-only
- Reads at “xCluster safe time” ensure atomicity
- Source transactions become visible atomically
- Point-in-Time Recovery (PITR) for failover
xCluster Safe Time: The target reads as of a time sufficiently in the past that all source transactions committed before that time have been fully replicated. Lags real-time by the current replication lag (typically 1-2 seconds).
# Transactional setup
Replication Type: Unidirectional
Write Mode: Source only
Consistency: Read-your-writes at safe time
Failover: PITR to latest safe time
Transactional Modes
-
Automatic Mode (Recommended)
- Automatic schema change replication
- Simplest operational model
- Supports most DDL operations
-
Semi-Automatic Mode
- Database-scoped replication
- Manual DDL coordination required
- Fewer setup steps than manual mode
-
Manual Mode (Deprecated)
- Table-level replication
- Complex manual schema management
- Not recommended for new deployments
Deployment Topologies
Active-Standby (Disaster Recovery)
Primary Universe (us-west) Standby Universe (us-east)
┌──────────────────┐ ┌──────────────────┐
│ Read + Write │─────────────►│ Read Only │
│ Cluster │ xCluster │ Cluster │
│ │ (Txn Mode) │ │
└──────────────────┘ └──────────────────┘
│ writes │ reads
│ │
Application Analytics/DR
Configuration:
- Transactional mode for consistency
- PITR enabled for clean failover
- Monitor replication lag
- Regular failover drills
Active-Active Multi-Master
Universe A (us-west) Universe B (us-east)
┌──────────────────┐ ┌──────────────────┐
│ Read + Write │◄────────────►│ Read + Write │
│ Cluster │ Bidirectional│ Cluster │
│ │ xCluster │ │
└──────────────────┘ └──────────────────┘
│ writes │ writes
│ │
West Users East Users
Configuration:
- Non-transactional mode
- Two unidirectional replication flows
- Conflict resolution: last writer wins
- Application must handle eventual consistency
Active-active can lead to data inconsistencies. Carefully partition data or use application-level conflict resolution. Not recommended for YSQL.
Setup and Configuration
Prerequisites
- Source and Target Universes: Both must be running and accessible
- Network Connectivity: Target must reach source tablet servers
- Matching Schema: Tables must have identical schema (automatic mode handles this)
- Sufficient Resources: Target needs capacity for replication workload
Bootstrapping Replication
For existing source universe with data:
# Step 1: Create checkpoint on source
yb-admin -master_addresses <source_masters> \
bootstrap_cdc_producer <namespace_id>
# Step 2: Backup source universe
yb-admin -master_addresses <source_masters> \
create_snapshot <namespace_id> <table_id>
# Step 3: Restore to target universe
yb-admin -master_addresses <target_masters> \
restore_snapshot <snapshot_id> <namespace_id>
# Step 4: Setup xCluster replication
yb-admin -master_addresses <target_masters> \
setup_universe_replication <source_universe_id> \
<source_masters> <table_ids>
YugabyteDB Anywhere provides GUI-based xCluster management:
-
Navigate to Replication:
- Select source universe
- Click “Configure Replication”
-
Configure Target:
- Enter target universe UUID and master addresses
- Select databases/tables to replicate
- Set log retention duration
-
Bootstrap (if needed):
- Platform automatically handles backup/restore
- Checkpoints source before data copy
- Sets up replication streams
-
Monitor:
- View replication lag per table
- Track source-target relationships
- Configure alerting thresholds
CLI Configuration
# Setup replication from source to target
yb-admin -master_addresses <target_masters> \
setup_universe_replication \
<source_universe_id> \
<source_master_addresses> \
<comma_separated_table_ids>
# Check replication status
yb-admin -master_addresses <target_masters> \
get_universe_replication_info
# Pause replication
yb-admin -master_addresses <target_masters> \
pause_universe_replication <source_universe_id>
# Resume replication
yb-admin -master_addresses <target_masters> \
resume_universe_replication <source_universe_id>
# Delete replication
yb-admin -master_addresses <target_masters> \
delete_universe_replication <source_universe_id>
Transaction Handling
Single-Shard Transactions
Commit generates a single WAL record containing all writes:
- Poller receives batched WAL record
- Examines each write to determine target tablet
- Forwards writes to appropriate target tablets
- Preserves commit timestamp
- Marks as external (prevents re-replication)
Distributed Transactions
Multi-tablet transactions involve:
- Intent Records: Provisional writes on each tablet (WAL records)
- Transaction Status: Tracked on transaction status tablet
- Commit: Updates status tablet (generates WAL record)
- Apply: Asynchronously applies intents to tablets (generates apply WAL records)
Replication:
- Target creates inert provisional records (no locking)
- Apply WAL record distributed to all pollers managing relevant tablets
- Transaction becomes visible when apply completes on each target tablet
- Non-transactional mode: Tablets apply independently (torn reads possible)
- Transactional mode: Reads wait for xCluster safe time
xCluster Safe Time Calculation
# Pseudo-code
xcluster_safe_time(database) =
min(tablet.xcluster_apply_safe_time
for tablet in database.tablets)
# Each tablet's apply safe time ensures:
# - All transactions committed before this time are fully applied
# - Source generates apply WAL records every 250ms
# - Advances even with long-running transactions
Poller Distribution
Target tablets may need changes from multiple source tablets (and vice versa) due to:
- Different tablet counts between universes
- Tablet splits occurring at different times
- Different sharding boundaries
Optimization: One poller per source tablet prevents redundant cross-universe reads. When multiple target tablets need data from the same source tablet, one poller distributes changes to all relevant targets.
Source (3 tablets) Target (2 tablets)
┌────────────┐ ┌────────────┐
│ Tablet A │◄────Poller 1────►│ Tablet X │
│ Range: A-M │ │ Range: A-P │
└────────────┘ └────────────┘
┌────────────┐ │
│ Tablet B │◄────Poller 2───────────┤
│ Range: N-T │ ┌────────────┐
└────────────┘ │ Tablet Y │
┌────────────┐ │ Range: Q-Z │
│ Tablet C │◄────Poller 2────►│ │
│ Range: U-Z │ └────────────┘
└────────────┘
Tablet splits are replicated via WAL, automatically updating poller mappings.
Schema Changes
Automatic Mode
DDL changes automatically replicate to target:
-- Source: Execute DDL normally
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Target: Change automatically applied
-- No manual intervention needed
Semi-Automatic and Manual Modes
Manual DDL coordination required:
- Pause replication (if necessary)
- Apply DDL to source universe
- Apply DDL to target universe (same schema)
- Resume replication
Replication auto-pauses when schema divergence detected and resumes when schemas match.
Unsupported DDLs:
- Table rewrites (requires drop and re-setup)
TRUNCATE
CREATE TABLE AS / SELECT INTO
Monitoring and Alerting
Key Metrics
-- Replication lag (ms)
SELECT table_id,
consumer_registry_tablet_lag_ms
FROM cdc_state;
-- Poller status
SELECT table_id,
stream_state,
checkpoint_op_id
FROM cdc_state;
- Lag Dashboard: Table-level replication lag
- Universe Health: Replication flow status
- Alerts: Configure thresholds for:
- Replication lag exceeding SLA
- Replication errors
- Schema drift detection
Log Retention
Configure WAL retention to handle target downtime:
# Set WAL retention hours
yb-admin -master_addresses <source_masters> \
set_wal_retention_secs <table_id> <seconds>
Failover and Recovery
Planned Failover (Transactional Mode)
- Stop writes to source universe
- Wait for replication to catch up (lag = 0)
- Verify xCluster safe time advanced to recent timestamp
- Switch application to target universe
- (Optional) Reverse replication direction
Unplanned Failover (Disaster Recovery)
Transactional Mode:
- Rewind target to latest xCluster safe time using PITR
- Promote target to primary
- Reconfigure application endpoints
- Assess data loss: Maximum = replication lag at failure time
Non-Transactional Mode:
- Promote target to primary
- Manual reconciliation of torn transactions
- Verify data consistency (indexes, constraints)
Reversing Replication
After failover, reverse the replication direction:
# Setup reverse replication: old-target → old-source
yb-admin -master_addresses <old_source_masters> \
setup_universe_replication <old_target_id> ...
Best Practices
-
Choose the Right Mode:
- Transactional for DR and read replicas
- Non-transactional only for YCQL active-active
-
Monitor Replication Lag:
- Set alerts for lag > 5 seconds
- Investigate sustained lag immediately
-
Test Failover Procedures:
- Regular DR drills
- Verify PITR recovery points
- Document runbooks
-
Schema Management:
- Use automatic mode for new deployments
- Test DDLs in staging first
- Coordinate changes during low traffic
-
Network Configuration:
- Direct pod/node communication in Kubernetes
- Dedicated network for replication traffic
- Monitor network latency
-
Capacity Planning:
- Target needs resources for replication workload
- WAL retention requires additional storage
- Poller count scales with tablet count
Limitations
General
- Broadcast topology (A → B, A → C): Not supported
- Consolidation (B → A, C → A): Not supported
- Daisy chaining (A → B → C): Not supported
- Star topology (A ↔ B ↔ C): Not supported
YSQL Specific
- Array types with row/domain/multi-range base types
- Table columns with row types
- pgvector indexes
- Materialized views (semi-automatic/manual modes)
- Sequences (semi-automatic/manual modes)
Kubernetes
- Requires direct pod-to-pod communication
- Load balancer communication not supported
Learn More