xCluster Replication

xCluster replication enables high-throughput asynchronous physical replication between independent YugabyteDB universes. It provides disaster recovery, multi-region active-active deployments, and low-latency writes without requiring synchronous consensus across regions.

Architecture

xCluster replicates Write-Ahead Log (WAL) records from source universe tablets to target universe tablets using a polling-based architecture:

Source Universe: Produces WAL records as transactions commit
Target Universe: Contains pollers that request WAL changes from source tablets
CDC Streams: Track replication progress using the cdc_state table
Pollers: Independent processes in the target universe that poll source tablets

Replication Flow

Source Universe                    Target Universe
┌──────────────┐                  ┌──────────────┐
│  Tablet 1    │◄─────poll────────│   Poller 1   │
│  (Leader)    │──────changes────►│              │
│  WAL: OpId N │                  │  Writes to   │
└──────────────┘                  │  Tablet(s)   │
                                  └──────────────┘
        │                                 │
        │                                 │
   ┌────▼─────┐                    ┌─────▼────┐
   │cdc_state │                    │checkpoint│
   │ table    │                    │tracking  │
   └──────────┘                    └──────────┘

Target pollers request changes starting from the last confirmed OpId
Source tablets return batches of WAL records
Pollers apply changes to target tablets with preserved commit times
Progress is checkpointed in cdc_state on the source

Replication Modes

Non-Transactional Replication

Use Case: Active-active multi-master deployments, low-latency reads Characteristics:

Writes allowed on both universes
Last-writer-wins conflict resolution (by hybrid timestamp)
Eventual consistency; may serve stale or torn reads
Supports bidirectional replication

Limitations:

Torn transactions possible during failover
Index corruption risk with concurrent writes to same row
Foreign key and unique constraints not guaranteed
No automatic conflict resolution

# Non-transactional setup
Replication Type: Bidirectional
Write Mode: Multi-master
Consistency: Eventual
Conflict Resolution: Last Writer Wins

Transactional Replication

Use Case: Disaster recovery, active-standby deployments, consistent read replicas Characteristics:

Target universe is read-only
Reads at “xCluster safe time” ensure atomicity
Source transactions become visible atomically
Point-in-Time Recovery (PITR) for failover

xCluster Safe Time: The target reads as of a time sufficiently in the past that all source transactions committed before that time have been fully replicated. Lags real-time by the current replication lag (typically 1-2 seconds).

# Transactional setup  
Replication Type: Unidirectional
Write Mode: Source only
Consistency: Read-your-writes at safe time
Failover: PITR to latest safe time

Transactional Modes

Automatic Mode (Recommended)
- Automatic schema change replication
- Simplest operational model
- Supports most DDL operations
Semi-Automatic Mode
- Database-scoped replication
- Manual DDL coordination required
- Fewer setup steps than manual mode
Manual Mode (Deprecated)
- Table-level replication
- Complex manual schema management
- Not recommended for new deployments

Deployment Topologies

Active-Standby (Disaster Recovery)

  Primary Universe (us-west)        Standby Universe (us-east)
  ┌──────────────────┐              ┌──────────────────┐
  │   Read + Write   │─────────────►│    Read Only     │
  │   Cluster        │  xCluster    │    Cluster       │
  │                  │  (Txn Mode)  │                  │
  └──────────────────┘              └──────────────────┘
        │ writes                           │ reads
        │                                  │
   Application                        Analytics/DR

Configuration:

Transactional mode for consistency
PITR enabled for clean failover
Monitor replication lag
Regular failover drills

Active-Active Multi-Master

  Universe A (us-west)              Universe B (us-east)
  ┌──────────────────┐              ┌──────────────────┐
  │   Read + Write   │◄────────────►│   Read + Write   │
  │   Cluster        │  Bidirectional│   Cluster       │
  │                  │  xCluster    │                  │
  └──────────────────┘              └──────────────────┘
        │ writes                           │ writes
        │                                  │
   West Users                         East Users

Configuration:

Non-transactional mode
Two unidirectional replication flows
Conflict resolution: last writer wins
Application must handle eventual consistency

Active-active can lead to data inconsistencies. Carefully partition data or use application-level conflict resolution. Not recommended for YSQL.

Setup and Configuration

Prerequisites

Source and Target Universes: Both must be running and accessible
Network Connectivity: Target must reach source tablet servers
Matching Schema: Tables must have identical schema (automatic mode handles this)
Sufficient Resources: Target needs capacity for replication workload

Bootstrapping Replication

For existing source universe with data:

# Step 1: Create checkpoint on source
yb-admin -master_addresses <source_masters> \
  bootstrap_cdc_producer <namespace_id>

# Step 2: Backup source universe
yb-admin -master_addresses <source_masters> \
  create_snapshot <namespace_id> <table_id>

# Step 3: Restore to target universe  
yb-admin -master_addresses <target_masters> \
  restore_snapshot <snapshot_id> <namespace_id>

# Step 4: Setup xCluster replication
yb-admin -master_addresses <target_masters> \
  setup_universe_replication <source_universe_id> \
  <source_masters> <table_ids>

Using YugabyteDB Anywhere (Platform)

YugabyteDB Anywhere provides GUI-based xCluster management:

Navigate to Replication:
- Select source universe
- Click “Configure Replication”
Configure Target:
- Enter target universe UUID and master addresses
- Select databases/tables to replicate
- Set log retention duration
Bootstrap (if needed):
- Platform automatically handles backup/restore
- Checkpoints source before data copy
- Sets up replication streams
Monitor:
- View replication lag per table
- Track source-target relationships
- Configure alerting thresholds

CLI Configuration

# Setup replication from source to target
yb-admin -master_addresses <target_masters> \
  setup_universe_replication \
    <source_universe_id> \
    <source_master_addresses> \
    <comma_separated_table_ids>

# Check replication status
yb-admin -master_addresses <target_masters> \
  get_universe_replication_info

# Pause replication  
yb-admin -master_addresses <target_masters> \
  pause_universe_replication <source_universe_id>

# Resume replication
yb-admin -master_addresses <target_masters> \
  resume_universe_replication <source_universe_id>

# Delete replication
yb-admin -master_addresses <target_masters> \
  delete_universe_replication <source_universe_id>

Transaction Handling

Single-Shard Transactions

Commit generates a single WAL record containing all writes:

Poller receives batched WAL record
Examines each write to determine target tablet
Forwards writes to appropriate target tablets
Preserves commit timestamp
Marks as external (prevents re-replication)

Distributed Transactions

Multi-tablet transactions involve:

Intent Records: Provisional writes on each tablet (WAL records)
Transaction Status: Tracked on transaction status tablet
Commit: Updates status tablet (generates WAL record)
Apply: Asynchronously applies intents to tablets (generates apply WAL records)

Replication:

Target creates inert provisional records (no locking)
Apply WAL record distributed to all pollers managing relevant tablets
Transaction becomes visible when apply completes on each target tablet
Non-transactional mode: Tablets apply independently (torn reads possible)
Transactional mode: Reads wait for xCluster safe time

xCluster Safe Time Calculation

# Pseudo-code
xcluster_safe_time(database) = 
  min(tablet.xcluster_apply_safe_time 
      for tablet in database.tablets)

# Each tablet's apply safe time ensures:
# - All transactions committed before this time are fully applied
# - Source generates apply WAL records every 250ms
# - Advances even with long-running transactions

Poller Distribution

Target tablets may need changes from multiple source tablets (and vice versa) due to:

Different tablet counts between universes
Tablet splits occurring at different times
Different sharding boundaries

Optimization: One poller per source tablet prevents redundant cross-universe reads. When multiple target tablets need data from the same source tablet, one poller distributes changes to all relevant targets.

Source (3 tablets)              Target (2 tablets)
┌────────────┐                  ┌────────────┐
│ Tablet A   │◄────Poller 1────►│ Tablet X   │
│ Range: A-M │                  │ Range: A-P │
└────────────┘                  └────────────┘
┌────────────┐                        │
│ Tablet B   │◄────Poller 2───────────┤
│ Range: N-T │                  ┌────────────┐
└────────────┘                  │ Tablet Y   │
┌────────────┐                  │ Range: Q-Z │
│ Tablet C   │◄────Poller 2────►│            │
│ Range: U-Z │                  └────────────┘
└────────────┘

Tablet splits are replicated via WAL, automatically updating poller mappings.

Schema Changes

Automatic Mode

DDL changes automatically replicate to target:

-- Source: Execute DDL normally
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Target: Change automatically applied
-- No manual intervention needed

Semi-Automatic and Manual Modes

Manual DDL coordination required:

Pause replication (if necessary)
Apply DDL to source universe
Apply DDL to target universe (same schema)
Resume replication

Replication auto-pauses when schema divergence detected and resumes when schemas match. Unsupported DDLs:

Table rewrites (requires drop and re-setup)
TRUNCATE
CREATE TABLE AS / SELECT INTO

Monitoring and Alerting

Key Metrics

-- Replication lag (ms)
SELECT table_id, 
       consumer_registry_tablet_lag_ms
FROM cdc_state;

-- Poller status
SELECT table_id,
       stream_state,
       checkpoint_op_id
FROM cdc_state;

Platform Monitoring

Lag Dashboard: Table-level replication lag
Universe Health: Replication flow status
Alerts: Configure thresholds for:
- Replication lag exceeding SLA
- Replication errors
- Schema drift detection

Log Retention

Configure WAL retention to handle target downtime:

# Set WAL retention hours
yb-admin -master_addresses <source_masters> \
  set_wal_retention_secs <table_id> <seconds>

Failover and Recovery

Planned Failover (Transactional Mode)

Stop writes to source universe
Wait for replication to catch up (lag = 0)
Verify xCluster safe time advanced to recent timestamp
Switch application to target universe
(Optional) Reverse replication direction

Unplanned Failover (Disaster Recovery)

Transactional Mode:

Rewind target to latest xCluster safe time using PITR
Promote target to primary
Reconfigure application endpoints
Assess data loss: Maximum = replication lag at failure time

Non-Transactional Mode:

Promote target to primary
Manual reconciliation of torn transactions
Verify data consistency (indexes, constraints)

Reversing Replication

After failover, reverse the replication direction:

# Setup reverse replication: old-target → old-source
yb-admin -master_addresses <old_source_masters> \
  setup_universe_replication <old_target_id> ...

Best Practices

Choose the Right Mode:
- Transactional for DR and read replicas
- Non-transactional only for YCQL active-active
Monitor Replication Lag:
- Set alerts for lag > 5 seconds
- Investigate sustained lag immediately
Test Failover Procedures:
- Regular DR drills
- Verify PITR recovery points
- Document runbooks
Schema Management:
- Use automatic mode for new deployments
- Test DDLs in staging first
- Coordinate changes during low traffic
Network Configuration:
- Direct pod/node communication in Kubernetes
- Dedicated network for replication traffic
- Monitor network latency
Capacity Planning:
- Target needs resources for replication workload
- WAL retention requires additional storage
- Poller count scales with tablet count

Limitations

General

Broadcast topology (A → B, A → C): Not supported
Consolidation (B → A, C → A): Not supported
Daisy chaining (A → B → C): Not supported
Star topology (A ↔ B ↔ C): Not supported

YSQL Specific

Array types with row/domain/multi-range base types
Table columns with row types
pgvector indexes
Materialized views (semi-automatic/manual modes)
Sequences (semi-automatic/manual modes)

Kubernetes

Requires direct pod-to-pod communication
Load balancer communication not supported

Get Started

Core Concepts

Deployment

Develop

Operations

Security

Advanced Features

​Architecture

​Replication Flow

​Replication Modes

​Non-Transactional Replication

​Transactional Replication

​Transactional Modes

​Deployment Topologies

​Active-Standby (Disaster Recovery)

​Active-Active Multi-Master

​Setup and Configuration

​Prerequisites

​Bootstrapping Replication

​Using YugabyteDB Anywhere (Platform)

​CLI Configuration

​Transaction Handling

​Single-Shard Transactions

​Distributed Transactions

​xCluster Safe Time Calculation

​Poller Distribution

​Schema Changes

​Automatic Mode

​Semi-Automatic and Manual Modes

​Monitoring and Alerting

​Key Metrics

​Platform Monitoring

​Log Retention

​Failover and Recovery

​Planned Failover (Transactional Mode)

​Unplanned Failover (Disaster Recovery)

​Reversing Replication

​Best Practices

​Limitations

​General

​YSQL Specific

​Kubernetes

​Learn More

Build docs developers (and LLMs) love

Architecture

Replication Flow

Replication Modes

Non-Transactional Replication

Transactional Replication

Transactional Modes

Deployment Topologies

Active-Standby (Disaster Recovery)

Active-Active Multi-Master

Setup and Configuration

Prerequisites

Bootstrapping Replication

Using YugabyteDB Anywhere (Platform)

CLI Configuration

Transaction Handling

Single-Shard Transactions

Distributed Transactions

xCluster Safe Time Calculation

Poller Distribution

Schema Changes

Automatic Mode

Semi-Automatic and Manual Modes

Monitoring and Alerting

Key Metrics

Platform Monitoring

Log Retention

Failover and Recovery

Planned Failover (Transactional Mode)

Unplanned Failover (Disaster Recovery)

Reversing Replication

Best Practices

Limitations

General

YSQL Specific

Kubernetes

Learn More