Skip to main content
Metadb provides powerful transformation capabilities to reshape data from your sources into analytics-ready tables. The most common transformation is extracting JSON fields into relational columns, but the system supports various data processing workflows.

JSON Transformations

JSON transformations extract nested JSON objects and arrays into standard table columns, making the data easier to query and analyze.

Why Transform JSON?

Consider querying JSON data without transformation:
select __id,
       __start,
       id,
       jsonb_extract_path_text(jsondata, 'groupname') as groupname,
       jsonb_extract_path_text(jsondata, 'description') as description
    from library.patrongroup;
With transformation, the query becomes simpler:
select __id, __start, id, groupname, description 
    from library.patrongroup__t;
Transformations improve both query simplicity and performance by eliminating the need for JSON extraction functions.

Creating Data Mappings

Use the create data mapping command to define JSON transformations:
create data mapping for json
    from table library.patrongroup__ column jsondata path '$'
    to 't';
1

Specify the mapping type

Currently, json is the supported mapping type for extracting JSON fields.
2

Identify the source table and column

Specify the main table (ending in __) and the column containing JSON data.
3

Define the JSON path

Use JSONPath syntax to identify which object or array to transform:
  • '$' = root object
  • '$.metadata' = nested metadata object
  • '$.tags.tagList' = array within tags object
4

Choose a target identifier

Provide a short identifier (lowercase, max 16 chars) that will be used in the transformed table name.

Generated Tables

When you create a data mapping, Metadb generates two new tables:
  • Main transformed table: table__{identifier}__ (with history)
  • Current transformed table: table__{identifier} (current records only)
For example, mapping patrongroup__ with identifier 't' creates:
  • patrongroup__t__ (main transformed)
  • patrongroup__t (current transformed)

Nested JSON Transformations

For nested JSON structures, create multiple mappings from the root down to nested elements:
-- Root object
create data mapping for json
    from table library.inventory__ column jsondata path '$'
    to 't';

-- Nested metadata object
create data mapping for json
    from table library.inventory__ column jsondata path '$.metadata'
    to 'metadata';

-- Tags object
create data mapping for json
    from table library.inventory__ column jsondata path '$.tags'
    to 'tags';

-- Array within tags
create data mapping for json
    from table library.inventory__ column jsondata path '$.tags.tagList'
    to 'taglist';
An object or array will only be transformed if all its parent paths are also transformed. You must create mappings for '$' and '$.tags' before '$.tags.tagList' will be processed.

Transformation Hierarchy

Visual representation of the example above:
$ (root)
├── metadata
└── tags
    └── tagList
Each level requires its own data mapping.

Dynamic Schema Inference

JSON transformations are schemaless - columns are inferred from the actual data rather than a predefined schema.
  • Metadb scans JSON records as they stream in
  • When a field is found, a column is created for it
  • New fields discovered later automatically add new columns
  • No manual schema definition required

Example: Complete JSON Transformation

Let’s walk through a complete example:

Source Data

select __id, __start, id, jsondata from library.patrongroup;
__id__startidjsondata
82022-04-18 19:27:18-0015{"groupname": "undergrad", "description": "Undergraduate Student"}
42022-04-17 17:42:25-0010{"groupname": "graduate", "description": "Graduate Student"}

Create the Mapping

create data mapping for json
    from table library.patrongroup__ column jsondata path '$'
    to 't';

Wait for Next Update

The transformation takes effect when JSON data are next updated by the streaming process. You don’t need to manually trigger it.

Query the Transformed Table

select __id, __start, id, groupname, description 
    from library.patrongroup__t;
__id__startidgroupnamedescription
22022-04-18 19:27:18-0015undergradUndergraduate Student
52022-04-17 17:42:25-0010graduateGraduate Student

Managing Data Mappings

List Configured Mappings

list data_mappings;

Remove a Mapping

drop data mapping for json
    from table library.inventory__ column jsondata path '$.tags.tagList';
Dropping a data mapping removes the configuration but does not delete the transformed tables. The tables will no longer be updated with new transformations.

External SQL Transformations

Metadb can schedule external SQL files to run on a regular basis for custom transformations and derived tables.

Configuration

External SQL is enabled automatically for certain modules (like “folio” or “reshare”) when specified in the data source:
alter system set external_sql_folio = 'refs/tags/v1.8.0';
This tells Metadb to run SQL from the folio-analytics repository at the specified Git reference.

SQL File Format

External SQL files should follow these guidelines:
--metadb:table library_patrons

drop table if exists library_patrons;

create table library_patrons as
select p.id,
       p.barcode,
       g.groupname,
       p.active
    from library.patron__t p
    join library.patrongroup__t g on p.patrongroup_id = g.id;
1

Add the table directive

Start with --metadb:table <table_name> to declare which table this SQL updates.
2

Use empty lines as separators

Separate SQL statements with empty lines.
3

Omit schema names

Don’t specify a schema name in create table. Metadb will place it in the appropriate location.
4

Reference Metadb tables

Use fully-qualified names when reading from Metadb tables.

Monitoring External SQL

Check when external SQL tables were last updated:
select * from metadb.table_update 
    where table_name = 'library_patrons';
View the system log for execution details:
select * from mdblog() where message like '%external%';

Data Type Conversions

Metadb automatically converts data types in several scenarios:

Automatic Conversions (Record-at-a-Time)

These conversions happen as records stream:
From TypeTo numericTo uuidTo jsonbTo text
boolean
smallint
integer
bigint
real
double precision
numeric
date/time types
uuid
json

Inferred Conversions

Metadb can infer uuid type from text or varchar columns:
alter table library.patron__ alter column patrongroup_id type uuid;
Type inference analyzes existing data to safely convert columns to more specific types like UUID.

User-Defined Transformations

Create custom transformation logic using database functions in your user workspace:
create table celia.westgroup as
    select * from library.patrongroup where __origin = 'west';
Share with other users:
grant usage on schema celia to rosalind;
grant select on celia.westgroup to rosalind;

Parameterized Reports

Package transformations as functions for reusable reports:
create function lisa.count_loans(
    start_date date default '2000-01-01',
    end_date date default '2050-01-01'
)
returns table(
    item_id uuid,
    loan_count integer
)
as $$
    select item_id,
           count(*) as loan_count
        from folio_circulation.loan__t
        where start_date <= loan_date and loan_date < end_date
        group by item_id
$$
language sql
stable
parallel safe;
Usage:
select * from lisa.count_loans(
    start_date => '2023-01-01', 
    end_date => '2024-01-01'
);

Best Practices

Transform Frequently-Queried JSON

Create data mappings for JSON fields that are regularly used in queries and reports.

Use Descriptive Identifiers

Choose meaningful target identifiers like 'metadata' instead of generic ones like 'm'.

Map Complete Hierarchies

When transforming nested JSON, map all parent levels from root down to the target.

Encapsulate Logic in Functions

Use database functions instead of views to package transformation logic and complex queries.

Monitor Transformation Updates

Regularly check metadb.table_update to ensure transformed tables are updating as expected.

Test Before Production

Test external SQL and complex transformations in a user workspace before deploying to production schemas.

Build docs developers (and LLMs) love