Skip to main content
Materialize can ingest data from MySQL (5.7+) databases using MySQL’s GTID-based binlog replication protocol for Change Data Capture (CDC). This allows you to continuously stream INSERT, UPDATE, and DELETE operations from your MySQL database into Materialize.

Prerequisites

Before creating a MySQL source, you must:
  1. Enable GTID-based binlog replication in your MySQL database
  2. Create a dedicated replication user with appropriate permissions
  3. Configure network access to allow Materialize to connect
  4. Ensure sufficient binlog retention (recommended: 30 days)
GTID-based binlog replication must be enabled before you can create a MySQL source. This may require a database restart depending on your configuration.

Step 1: Configure MySQL

Enable GTID-Based Binlog Replication

Verify and configure the following MySQL server variables:
-- In MySQL: Check current configuration
SHOW VARIABLES WHERE variable_name IN (
    'log_bin',
    'binlog_format',
    'binlog_row_image',
    'gtid_mode',
    'enforce_gtid_consistency',
    'binlog_expire_logs_seconds'
);
Required settings:
# In my.cnf or my.ini
log_bin = ON
binlog_format = ROW
binlog_row_image = FULL
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_expire_logs_seconds = 2592000  # 30 days
For managed MySQL services (RDS, Aurora, Cloud SQL), these settings may need to be configured through parameter groups or service-specific mechanisms.

Create a Replication User

Create a dedicated MySQL user with replication privileges:
-- In MySQL
CREATE USER 'materialize'@'%' IDENTIFIED BY '<password>';

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'materialize'@'%';

GRANT SELECT ON <database>.* TO 'materialize'@'%';

FLUSH PRIVILEGES;

Configure Binlog Retention

If Materialize tries to resume replication and finds GTID gaps due to missing binlog files, the source will enter an errored state and must be recreated.
For self-hosted MySQL:
-- Set binlog retention to 30 days (2592000 seconds)
SET PERSIST binlog_expire_logs_seconds = 2592000;
For Amazon RDS/Aurora:
-- RDS uses a separate parameter
CALL mysql.rds_set_configuration('binlog retention hours', 720);  -- 30 days

Step 2: Configure Network Security

Choose a network security option based on your deployment:

Step 3: Create a Connection in Materialize

CREATE SECRET mysqlpass AS '<MYSQL_PASSWORD>';

CREATE CONNECTION mysql_connection TO MYSQL (
    HOST 'instance.foo000.us-west-1.rds.amazonaws.com',
    PORT 3306,
    USER 'materialize',
    PASSWORD SECRET mysqlpass
);

Step 4: Create a Source

Create subsources for all tables across all schemas:
CREATE SOURCE mysql_source
    FROM MYSQL CONNECTION mysql_connection
    FOR ALL TABLES;

Handling Unsupported Types

If your tables contain unsupported data types, use the TEXT COLUMNS option:
CREATE SOURCE mysql_source
    FROM MYSQL CONNECTION mysql_connection (
        TEXT COLUMNS (mydb.table_name.column_with_unsupported_type)
    )
    FOR ALL TABLES;

Excluding Columns

MySQL doesn’t provide column-level filtering in binlog replication, but you can exclude columns in Materialize:
CREATE SOURCE mysql_source
    FROM MYSQL CONNECTION mysql_connection (
        EXCLUDE COLUMNS (mydb.users.password_hash, mydb.users.ssn)
    )
    FOR ALL TABLES;

How It Works

GTID-Based Replication

Materialize uses MySQL’s Global Transaction Identifiers (GTIDs) to track replication progress. GTIDs provide:
  • Transactional consistency: Operations in a MySQL transaction get the same timestamp in Materialize
  • Reliable resumption: Replication can resume from any point without data loss
  • No duplicate events: Each transaction is uniquely identified

Subsources

Materialize automatically creates a subsource for each table:
SHOW SOURCES;
         name         |   type
----------------------+-----------
 mysql_source          | mysql
 mysql_source_progress | progress
 users                 | subsource
 orders                | subsource
 products              | subsource

Progress Tracking

Monitor ingestion progress using the progress subsource:
SELECT transaction_id
FROM mysql_source_progress;
The transaction_id represents the next possible GTID and should increase as Materialize consumes binlog records.

Monitoring

Check Source Status

SELECT *
FROM mz_internal.mz_source_statuses
WHERE name = 'mysql_source';

Monitor Snapshotting Progress

SELECT
    name,
    snapshot_committed,
    bytes_received,
    messages_received
FROM mz_internal.mz_source_statistics s
JOIN mz_sources src ON s.id = src.id
WHERE src.name = 'mysql_source';

Monitor Replication Lag

In MySQL, check GTID execution status:
-- In MySQL
SHOW VARIABLES LIKE 'gtid_executed';

-- Compare with Materialize's progress
SELECT * FROM mysql_source_progress;

Schema Changes

Compatible Changes

Materialize automatically handles these schema changes:
  • Adding a column with a DEFAULT value
  • Adding a nullable column
  • Dropping a column
  • Changing column default values

Incompatible Changes

These changes require manual intervention:
  • Changing a column’s data type
  • Adding a NOT NULL constraint
  • Renaming columns or tables
  • Dropping and recreating tables
To handle incompatible changes:
-- 1. Drop the affected subsource
DROP SOURCE users;

-- 2. Add it back to the source
ALTER SOURCE mysql_source ADD SUBSOURCE users;

Handling Errors

Source in Error State

Check the error message:
SELECT error
FROM mz_internal.mz_source_statuses
WHERE name = 'mysql_source';
Common issues:
  • GTID gaps detected: Binlog files were purged; recreate the source
  • Connection refused: Check network security settings
  • Authentication failed: Verify credentials and permissions
  • Binlog not enabled: Enable binlog replication in MySQL

Recovering from GTID Gaps

If binlog files are purged while Materialize is offline, you must drop and recreate the source. There is no way to recover.
-- Drop the source and all subsources
DROP SOURCE mysql_source CASCADE;

-- Recreate with the same configuration
CREATE SOURCE mysql_source
    FROM MYSQL CONNECTION mysql_connection
    FOR ALL TABLES;

Best Practices

Resource Management

Create a dedicated cluster for MySQL sources:
CREATE CLUSTER mysql_ingest SIZE = '100cc';

SET CLUSTER = mysql_ingest;

CREATE SOURCE mysql_source FROM ...
For large initial snapshots, use a larger cluster temporarily:
-- Before creating the source
ALTER CLUSTER mysql_ingest SET (SIZE = '200cc');

-- After snapshotting completes
ALTER CLUSTER mysql_ingest SET (SIZE = '100cc');

Binlog Management

Always configure binlog retention for at least 30 days to handle extended outages or maintenance windows.
Monitor binlog disk usage:
-- In MySQL
SHOW BINARY LOGS;

SELECT
    ROUND(SUM(file_size)/1024/1024/1024, 2) AS binlog_size_gb
FROM information_schema.binary_logs;

Limit Data Volume

  • Only replicate schemas and tables you need
  • Use EXCLUDE COLUMNS for sensitive or unnecessary columns
  • Consider separate sources for high-volume tables

Performance Optimization

  • Use schema-specific sources instead of all-tables sources when possible
  • Monitor cluster resource usage and right-size for steady-state
  • Avoid running source creation during peak database hours

Troubleshooting

Source Not Progressing

Check MySQL replication status:
-- In MySQL
SHOW MASTER STATUS;
SHOW BINARY LOGS;
Verify GTID configuration:
-- In MySQL
SELECT @@gtid_mode, @@enforce_gtid_consistency;

High Memory Usage

MySQL sources with many tables or high update rates can consume significant memory:
  1. Increase cluster size to handle the workload
  2. Use separate sources for high-volume tables
  3. Limit the number of tables being replicated
  4. Monitor using mz_cluster_replica_metrics

Snapshotting Takes Too Long

  1. Scale up the cluster temporarily during snapshotting
  2. Reduce data volume by limiting tables
  3. Create sources for subsets of tables rather than all tables
  4. Run during off-peak hours to minimize database impact

Connection Issues

Test connectivity:
VALIDATE CONNECTION mysql_connection;
Common solutions:
  • Verify firewall rules allow connections on port 3306
  • Check that the MySQL user has proper permissions
  • Ensure GTID mode is enabled
  • Verify binlog format is set to ROW
  • Check that the database is accessible from Materialize IPs

Binlog Purged Error

If you see errors about missing binlog files:
  1. Check binlog retention settings in MySQL
  2. Increase retention to prevent future purges:
-- In MySQL
SET PERSIST binlog_expire_logs_seconds = 2592000;  -- 30 days

-- For RDS
CALL mysql.rds_set_configuration('binlog retention hours', 720);
  1. Recreate the source to resnapshot from current state

Supported MySQL Services

Materialize supports MySQL CDC from:
  • Amazon RDS for MySQL
  • Amazon Aurora MySQL
  • Google Cloud SQL for MySQL
  • Azure Database for MySQL
  • Self-hosted MySQL
  • MariaDB (with some limitations)
Some managed MySQL services require additional configuration steps. Consult your provider’s documentation for enabling GTID-based binlog replication.

Known Limitations

Unsupported Features

  • Partitioned tables: Some partition operations may not replicate correctly
  • Temporary tables: Not supported for replication
  • TRUNCATE operations: Not captured by binlog replication
  • Multi-source replication: Each source maintains its own connection

Data Type Considerations

  • JSON columns: Supported, but updates may show as full row replacements
  • ENUM and SET: Mapped to text types
  • Spatial types: Not supported; use TEXT COLUMNS option
  • Binary types: Supported as bytea

Next Steps

Transform Data

Create materialized views on your MySQL data

PostgreSQL CDC

Learn about ingesting data from PostgreSQL

Build docs developers (and LLMs) love