Sources describe external systems you want Materialize to read data from. They provide details about how to connect, decode, and interpret streaming data. In SQL terms, sources combine aspects of both tables (structured, queryable) and clients (responsible for reading data).
Sources continuously ingest data changes and make them immediately available for querying. Unlike traditional batch ETL, there’s no delay between data arriving in the source system and being queryable in Materialize.
Upsert: Key-value updates (maintains latest value per key)
Debezium: Full CDC with before/after images
None: Append-only streams
Kafka sources are extremely efficient. Materialize creates a single replication stream and shares it across all downstream views, minimizing bandwidth and broker load.
Capture real-time changes from PostgreSQL databases using logical replication.
-- Create a PostgreSQL connectionCREATE SECRET pg_password AS '...';CREATE CONNECTION pg_conn TO POSTGRES ( HOST 'postgres.example.com', PORT 5432, DATABASE 'production', USER 'materialize', PASSWORD SECRET pg_password);-- Create a source for all tables in a publicationCREATE SOURCE pg_sourceFROM POSTGRES CONNECTION pg_conn(PUBLICATION 'mz_source')FOR ALL TABLES;
How it works:
Materialize creates a replication slot in PostgreSQL (prefixed with materialize_)
Changes are streamed using PostgreSQL’s native replication protocol
Subsources are automatically created for each table in the publication
Transactional consistency is maintained — all changes in a transaction get the same timestamp
PostgreSQL 11 or higher is required. You must enable logical replication in the upstream database before creating a source.
Ingest changes from MySQL databases using GTID-based binlog replication.
-- Create a MySQL connectionCREATE SECRET mysql_password AS '...';CREATE CONNECTION mysql_conn TO MYSQL ( HOST 'mysql.example.com', PORT 3306, USER 'materialize', PASSWORD SECRET mysql_password);-- Create a source for specific schemas and tablesCREATE SOURCE mysql_sourceFROM MYSQL CONNECTION mysql_connFOR SCHEMAS (production, staging);
Requirements:
MySQL 5.7 or higher
GTID-based binary logging enabled
Row-based replication format
Configuration example:
# MySQL server configurationgtid_mode = ONenforce_gtid_consistency = ONbinlog_format = ROWbinlog_row_image = FULL
-- Create a webhook sourceCREATE SOURCE webhook_events FROM WEBHOOK BODY FORMAT JSON;-- Get the webhook URLSELECT url FROM mz_internal.mz_webhook_sourcesWHERE name = 'webhook_events';
Materialize maintains a copy of the key-value mapping to correctly interpret:
Insertion: New key appears
Update: Existing key with new value
Deletion: Key with null value
Upsert sources require additional memory to maintain the key-value state. Consider using indexed views if the upstream system can provide explicit INSERT/UPDATE/DELETE operations.
CREATE MATERIALIZED VIEW order_summary ASSELECT c.customer_name, COUNT(*) as order_count, SUM(o.total) as total_spentFROM pg_source_orders oJOIN pg_source_customers c ON o.customer_id = c.idGROUP BY c.customer_name;
If Materialize encounters GTID gaps due to missing binlog files, the source enters an error state and must be recreated. Ensure sufficient binlog retention:
-- Check binlog retention (in seconds)SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';-- Set to 7 daysSET GLOBAL binlog_expire_logs_seconds = 604800;