Overview
If a Kafka data stream fails and cannot be resumed, it may be necessary to re-stream a complete snapshot of the data to Metadb. This synchronization procedure allows Metadb to accept re-streamed data and synchronize with the source.During synchronization:
- The Metadb database continues to be available to users
- Streaming updates will be slower than usual
- There may temporarily be missing records (until re-streamed) or “extra” records (recently deleted in source)
- Periodic transforms and external SQL are paused
Resynchronization Procedure
Update data source configuration
Update the
topics and consumer_group configuration settings for the new data stream:Stop server and run sync
Stop the Metadb server and run
metadb sync before starting it again:This process may take a significant amount of time to run. The database generally remains available to users during this period.
Monitor snapshot completion
Monitor the log file for the completion message. Metadb detects when snapshot data are no longer being received and writes:You can also check snapshot status using:
Run endsync
Once the new data have finished (or nearly finished) re-streaming, stop the server and run
metadb endsync to remove old data that have not been refreshed:Timing Considerations
When should I run endsync?
When should I run endsync?
The timing of when to run
endsync is up to the administrator, but it must be run to complete the synchronization process.In most cases, it will be more convenient for users if endsync is run:- Too late (delaying removal of deleted records) rather than
- Too early (removing records before they have been re-streamed)
endsync.