Skip to main content

Metadb

A powerful system for synchronizing databases for analytics applications with continuous data streaming, transforms, and historical tracking.

What is Metadb?

Metadb extends PostgreSQL with features specifically designed for analytics workloads. It continuously synchronizes data from external sources (like transaction-processing databases or sensor networks) and maintains both current state and complete historical records. The system is built for scenarios where you need to:
  • Track how data changes over time
  • Query historical states at any point
  • Transform and flatten JSON or MARC data automatically
  • Support multiple concurrent data sources
  • Maintain a PostgreSQL-compatible query interface

Key Features

Continuous Synchronization

Stream data from Kafka sources with automatic schema detection and type inference

Historical Tracking

Every row includes temporal metadata (__start, __end, __current) for point-in-time queries

Data Transformation

Automatically flatten JSON objects and arrays into queryable columns

PostgreSQL Compatible

Query using standard SQL through a PostgreSQL-compatible interface

User Workspaces

Individual schemas for each user to create tables and save query results

Access Control

Granular privileges that persist across table recreations

How It Works

1

Configure a Data Source

Define a Kafka data source with connection settings and schema filters:
create data source sensor type kafka options (
    brokers 'kafka:29092',
    topics '^metadb_sensor_1\.',
    consumer_group 'metadb_sensor_1_1',
    add_schema_prefix 'sensor_'
);
2

Stream Data Automatically

Metadb reads change events from Kafka and creates tables automatically. Each table includes metadata columns for historical tracking.
3

Query Current and Historical Data

Query current data using base tables, or access complete history using main tables:
-- Current records only
select * from library.patrongroup;

-- All historical records
select * from library.patrongroup__;
4

Transform Complex Data

Configure JSON transformations to extract nested fields into columns:
create data mapping for json
    from table library.inventory__ column jsondata path '$'
    to 't';

Use Cases

Track circulation patterns, patron behavior, and collection usage over time. Metadb was designed with FOLIO library systems integration in mind.
Query data as it existed at any point in time. Perfect for auditing, compliance reporting, and understanding how metrics evolved.
Keep an analytics database continuously synchronized with production systems without impacting source database performance.
Combine data from multiple sources into unified tables using the __origin column to track provenance.

Architecture

Metadb sits between your data sources and analytics users:
  1. Source Database → PostgreSQL with logical decoding enabled
  2. Kafka Connect/Debezium → Captures change events from source
  3. Kafka → Streams change events
  4. Metadb → Processes events and maintains synchronized database
  5. PostgreSQL → Stores current and historical data
  6. Analytics Users → Query via PostgreSQL-compatible interface
Metadb is not a database system itself — it manages data in PostgreSQL and provides a PostgreSQL-compatible query interface on port 8550 (default).

Next Steps

Quickstart Guide

Get Metadb up and running in minutes

Installation

Build and configure Metadb for your environment

Core Concepts

Learn about data sources, table types, and transformations

CLI Reference

Explore all available commands

Build docs developers (and LLMs) love