Skip to main content

Overview

If a Kafka data stream fails and cannot be resumed, it may be necessary to re-stream a complete snapshot of the data to Metadb. This synchronization procedure allows Metadb to accept re-streamed data and synchronize with the source.
During synchronization:
  • The Metadb database continues to be available to users
  • Streaming updates will be slower than usual
  • There may temporarily be missing records (until re-streamed) or “extra” records (recently deleted in source)
  • Periodic transforms and external SQL are paused

Resynchronization Procedure

1

Update data source configuration

Update the topics and consumer_group configuration settings for the new data stream:
ALTER DATA SOURCE sensor OPTIONS
    (SET topics '^metadb_sensor_2\.', SET consumer_group 'metadb_sensor_2_1');
Do not restart the Metadb server. Continue directly to Step 2.
2

Stop server and run sync

Stop the Metadb server and run metadb sync before starting it again:
metadb stop -D data
metadb sync -D data --source sensor
This process may take a significant amount of time to run. The database generally remains available to users during this period.
3

Start the server

Start the Metadb server to begin streaming the data:
nohup metadb start -D data -l metadb.log &
4

Monitor snapshot completion

Monitor the log file for the completion message. Metadb detects when snapshot data are no longer being received and writes:
source snapshot complete (deadline exceeded)
You can also check snapshot status using:
LIST STATUS;
5

Run endsync

Once the new data have finished (or nearly finished) re-streaming, stop the server and run metadb endsync to remove old data that have not been refreshed:
metadb stop -D data
metadb endsync -D data --source sensor
You must run endsync to complete the synchronization process.
6

Restart the server

Start the server to resume normal operations:
nohup metadb start -D data -l metadb.log &

Timing Considerations

The timing of when to run endsync is up to the administrator, but it must be run to complete the synchronization process.In most cases, it will be more convenient for users if endsync is run:
  • Too late (delaying removal of deleted records) rather than
  • Too early (removing records before they have been re-streamed)
The “source snapshot complete (deadline exceeded)” message in the log generally indicates a good time to run endsync.
Until a failed stream is re-streamed by following the process above, the Metadb database may continue to be unsynchronized with the source.

Build docs developers (and LLMs) love