Architecture
YugabyteDB CDC is built on PostgreSQL logical replication and uses a publish-subscribe model:- Publications define which tables to replicate
- Replication Slots track the streaming position and checkpoint progress
- Output Plugins (
yboutputorpgoutput) format change events - Debezium Connector converts the replication stream to Kafka messages
How CDC Works
YugabyteDB automatically shards tables into tablets, each with its own Write-Ahead Log (WAL). The CDC process:- Snapshot Phase: On first connection, the connector takes a consistent snapshot of all configured tables
- Streaming Phase: After snapshot completion, the connector continuously streams changes from the WAL
- Change Events: Each INSERT, UPDATE, and DELETE operation produces a change event record
- Kafka Topics: Events are published to separate Kafka topics per table
Publications and Replication Slots
Creating a Publication
Publications define the tables to stream:Managing Publications
Creating Replication Slots
Replication slots checkpoint the streaming position:FULL- Include all column values before and after changesCHANGE- Only changed columns plus primary key (default)CHANGE_OLD_NEW- Changed columns with before/after valuesDEFAULT- PostgreSQL-compatible default behaviorNOTHING- Minimal information (INSERT only)
Using the Streaming Protocol
Alternatively, create slots via streaming protocol:Dropping Replication Slots
ysql_cdc_active_replication_slot_window_ms (default: 5 minutes).
Debezium Connector Setup
The YugabyteDB Debezium Connector streams changes to Kafka using the replication protocol.Connector Configuration
Basic connector properties:Key Configuration Properties
| Property | Description | Default |
|---|---|---|
slot.name | Replication slot name to consume from | Required |
slot.drop.on.stop | Drop slot when connector stops | false |
publication.name | Publication defining tables to stream | Optional |
publication.autocreate.mode | Auto-create publication | filtered |
plugin.name | Output plugin (yboutput or pgoutput) | yboutput |
snapshot.mode | Initial snapshot behavior (initial, never, initial_only) | initial |
Snapshot Modes
initial(default): Perform snapshot if no offset exists, then stream changesnever: Skip snapshot; start streaming from slot creation or stored offsetinitial_only: Perform snapshot and stop before streaming
Complete Setup Example
Output Plugins
yboutput Plugin
Native YugabyteDB output plugin with enhanced features:set field indicates whether a column value was explicitly set.
pgoutput Plugin
Standard PostgreSQL plugin for compatibility:Replica Identity
Controls what information is included in UPDATE and DELETE events:Impact on Change Events
| Identity | INSERT | UPDATE Before | UPDATE After | DELETE Before |
|---|---|---|---|---|
| FULL | All columns | All columns | All columns | All columns |
| DEFAULT | All columns | None | All columns | PK only |
| CHANGE | All columns | None | Changed + PK | PK only |
| NOTHING | All columns | N/A | N/A | N/A |
Streaming to Kafka
Topic Naming
By default, changes stream to topics named:{topic.prefix}.{schema}.{table}
Example topics for dbserver1 prefix:
dbserver1.public.usersdbserver1.public.departmentsdbserver1.public.orders
Change Event Structure
Operation Types
c- Create (INSERT)r- Read (snapshot)u- Updated- Delete
CDC with Kafka Connect
Kafka Connector Setup
- Download the YugabyteDB Connector:
- Configure Kafka Connect:
- Start the Connector:
Monitoring and Observability
Catalog Views
Monitoring Replication Lag
Track theconfirmed_flush_lsn to monitor consumer progress:
Use Cases
Microservice Event Streaming
Stream table changes to Kafka topics consumed by microservices:Data Warehouse Integration
Replicate operational data to analytics systems:Cache Invalidation
Invalidate application caches based on database changes:Audit and Compliance
Capture all changes for audit trails:Best Practices
- Choose Appropriate Replica Identity: Use
CHANGEfor efficiency,FULLwhen complete history is needed - Monitor Slot Lag: Regularly check replication slot lag to prevent WAL accumulation
- Set Retention Policies: Configure
ysql_cdc_active_replication_slot_window_msappropriately - Use Snapshots Wisely: For large tables, consider
snapshot.mode=neverafter initial load - Handle Schema Changes: Plan for DDL changes; some require recreating replication slots
- Secure Credentials: Use dedicated replication users with minimal privileges
- Partition Publications: Create separate publications for different use cases

