INSERT, UPDATE, and DELETE operations from your MySQL database into Materialize.
Prerequisites
Before creating a MySQL source, you must:- Enable GTID-based binlog replication in your MySQL database
- Create a dedicated replication user with appropriate permissions
- Configure network access to allow Materialize to connect
- Ensure sufficient binlog retention (recommended: 30 days)
Step 1: Configure MySQL
Enable GTID-Based Binlog Replication
Verify and configure the following MySQL server variables:For managed MySQL services (RDS, Aurora, Cloud SQL), these settings may need to be configured through parameter groups or service-specific mechanisms.
Create a Replication User
Create a dedicated MySQL user with replication privileges:Configure Binlog Retention
For self-hosted MySQL:Step 2: Configure Network Security
Choose a network security option based on your deployment:- AWS PrivateLink
- SSH Tunnel
- Static IP Allowlist
AWS PrivateLink provides secure, private connectivity between Materialize and your MySQL database in AWS.1. Create a target group for your MySQL instance:6. Retrieve and configure the AWS principal (see PostgreSQL guide for details)
- Target type: IP address
- Protocol: TCP
- Port: 3306 (or your custom port)
Step 3: Create a Connection in Materialize
- Direct Connection
- With AWS PrivateLink
- With SSH Tunnel
Step 4: Create a Source
- All Tables
- Specific Schemas
- Specific Tables
Create subsources for all tables across all schemas:
Handling Unsupported Types
If your tables contain unsupported data types, use theTEXT COLUMNS option:
Excluding Columns
MySQL doesn’t provide column-level filtering in binlog replication, but you can exclude columns in Materialize:How It Works
GTID-Based Replication
Materialize uses MySQL’s Global Transaction Identifiers (GTIDs) to track replication progress. GTIDs provide:- Transactional consistency: Operations in a MySQL transaction get the same timestamp in Materialize
- Reliable resumption: Replication can resume from any point without data loss
- No duplicate events: Each transaction is uniquely identified
Subsources
Materialize automatically creates a subsource for each table:Progress Tracking
Monitor ingestion progress using the progress subsource:transaction_id represents the next possible GTID and should increase as Materialize consumes binlog records.
Monitoring
Check Source Status
Monitor Snapshotting Progress
Monitor Replication Lag
In MySQL, check GTID execution status:Schema Changes
Compatible Changes
Materialize automatically handles these schema changes:- Adding a column with a
DEFAULTvalue - Adding a nullable column
- Dropping a column
- Changing column default values
Incompatible Changes
These changes require manual intervention:- Changing a column’s data type
- Adding a
NOT NULLconstraint - Renaming columns or tables
- Dropping and recreating tables
Handling Errors
Source in Error State
Check the error message:- GTID gaps detected: Binlog files were purged; recreate the source
- Connection refused: Check network security settings
- Authentication failed: Verify credentials and permissions
- Binlog not enabled: Enable binlog replication in MySQL
Recovering from GTID Gaps
Best Practices
Resource Management
Create a dedicated cluster for MySQL sources:Binlog Management
Monitor binlog disk usage:Limit Data Volume
- Only replicate schemas and tables you need
- Use
EXCLUDE COLUMNSfor sensitive or unnecessary columns - Consider separate sources for high-volume tables
Performance Optimization
- Use schema-specific sources instead of all-tables sources when possible
- Monitor cluster resource usage and right-size for steady-state
- Avoid running source creation during peak database hours
Troubleshooting
Source Not Progressing
Check MySQL replication status:High Memory Usage
MySQL sources with many tables or high update rates can consume significant memory:- Increase cluster size to handle the workload
- Use separate sources for high-volume tables
- Limit the number of tables being replicated
- Monitor using
mz_cluster_replica_metrics
Snapshotting Takes Too Long
- Scale up the cluster temporarily during snapshotting
- Reduce data volume by limiting tables
- Create sources for subsets of tables rather than all tables
- Run during off-peak hours to minimize database impact
Connection Issues
Test connectivity:- Verify firewall rules allow connections on port 3306
- Check that the MySQL user has proper permissions
- Ensure GTID mode is enabled
- Verify binlog format is set to ROW
- Check that the database is accessible from Materialize IPs
Binlog Purged Error
If you see errors about missing binlog files:- Check binlog retention settings in MySQL
- Increase retention to prevent future purges:
- Recreate the source to resnapshot from current state
Supported MySQL Services
Materialize supports MySQL CDC from:- Amazon RDS for MySQL
- Amazon Aurora MySQL
- Google Cloud SQL for MySQL
- Azure Database for MySQL
- Self-hosted MySQL
- MariaDB (with some limitations)
Some managed MySQL services require additional configuration steps. Consult your provider’s documentation for enabling GTID-based binlog replication.
Known Limitations
Unsupported Features
- Partitioned tables: Some partition operations may not replicate correctly
- Temporary tables: Not supported for replication
- TRUNCATE operations: Not captured by binlog replication
- Multi-source replication: Each source maintains its own connection
Data Type Considerations
- JSON columns: Supported, but updates may show as full row replacements
- ENUM and SET: Mapped to text types
- Spatial types: Not supported; use
TEXT COLUMNSoption - Binary types: Supported as bytea
Next Steps
Transform Data
Create materialized views on your MySQL data
PostgreSQL CDC
Learn about ingesting data from PostgreSQL