How Data Sources Work
Data sources in Metadb stream changes continuously to keep your analytics database synchronized with the source systems. When data changes in the source, those changes flow through to Metadb, which updates its tables accordingly.Metadb extends PostgreSQL with streaming data source capabilities, allowing you to build analytics on top of continuously updating data without manual ETL processes.
Supported Source Types
Currently, Metadb supports Kafka as a data source type, which enables integration with systems that use change data capture (CDC) to stream database changes.Kafka Data Sources
Kafka data sources connect to Kafka brokers and consume messages from specified topics. You can configure:- Brokers: Bootstrap servers for the Kafka cluster
- Topics: Regular expressions matching topics to read
- Consumer Groups: Kafka consumer group ID for offset management
- Security: SSL or plaintext protocols
- Filters: Schema and table filtering rules
Creating a Data Source
To create a new data source, use thecreate data source command:
Define source name and type
Choose a unique name for your data source and specify the type (currently
kafka)Configure connection options
Provide broker addresses, topics, consumer group, and security settings
Set up filtering rules
Use
schema_pass_filter, schema_stop_filter, and table_stop_filter to control which tables are synchronizedData Origin Tracking
The__origin column in Metadb tables allows you to track where data came from. This is especially useful when combining data from multiple sources into a single table.
__id | __start | __origin | id | groupname | description |
|---|---|---|---|---|---|
| 8 | 2022-04-18 19:27:18-00 | west | 15 | undergrad | Undergraduate Student |
| 4 | 2022-04-17 17:42:25-00 | east | 10 | graduate | Graduate Student |
Origins allow grouping data independently of data sources. While data sources may be dictated by how data is collected (e.g., geographically in a sensor network), origins provide logical grouping based on your application needs.
Managing Data Sources
Modifying a Data Source
Change data source settings usingalter data source:
Changes to data sources currently require restarting the Metadb server to take effect.
Removing a Data Source
Remove a data source configuration:Configuration Options
Schema and Table Filtering
Control which schemas and tables are synchronized:Schema Name Mapping
Modify schema names during synchronization:- trim_schema_prefix: Remove a prefix from schema names
- add_schema_prefix: Add a prefix to schema names
- map_public_schema: Map tables from the
publicschema to a different target schema
Monitoring Data Sources
Check the status of your data sources:Best Practices
Use Consumer Groups Wisely
Each Metadb instance should use a unique consumer group ID to maintain independent offset tracking.
Filter Unnecessary Tables
Use table and schema filters to reduce unnecessary data synchronization and improve performance.
Plan for Initial Snapshots
Initial snapshots can take significant time. Monitor the logs and wait for the “snapshot complete” message before running
endsync.Track Origins for Multi-Source Setups
When combining data from multiple sources, use the
__origin column to distinguish the source of each record.