Skip to main content
Data origins provide a way to tag and group records independently of their data sources. This is particularly useful when data sources are organized by collection method (e.g., geographic location in a sensor network) but you need to group data by logical categories.

Overview

An origin name can be stored by Metadb in the __origin column to tag individual records. The use and meaning of origins are typically defined by:
  • Your application logic
  • A Metadb module
  • Custom business rules
Data origins allow grouping data independently of data sources. This is useful when data sources are dictated by how data are collected (e.g., geographically in a sensor network), but you need different logical groupings.

Create Data Origin

Define a new data origin.

Syntax

create data origin origin_name

Parameters

origin_name
string
required
A unique name for the data origin to be created.

Example

Create a new origin for test data:
create data origin test_origin;

List Data Origins

View all configured data origins:
list data_origins;

Using Data Origins

Once created, origins can be referenced in your data processing and used to filter or group records. The __origin column in your tables will contain these origin identifiers.

Example Query

Filter records by origin:
select * 
from sensor.temperature
where __origin = 'test_origin';
Group data by origin:
select __origin, count(*), avg(temperature)
from sensor.temperature
group by __origin;

Use Cases

Use origins to identify which tenant or organization a record belongs to, even when all tenants share the same data source.
Tag records with geographic origins (e.g., 'east_region', 'west_region') while collecting from location-based data sources.
Distinguish between production, staging, and test data flowing through the same infrastructure.
Mark records with classification labels (e.g., 'public', 'internal', 'confidential') for access control purposes.

Data Origins vs Data Sources

  • Define how data is collected
  • Represent physical or technical boundaries
  • Examples: specific Kafka topics, database connections
  • Cannot be changed after data is ingested

Best Practices

Plan Your Origin Strategy Early: Define your origin naming scheme before ingesting data. Consistent naming makes querying and reporting much easier.
Origin names should be descriptive but concise. Avoid using sensitive information in origin names as they may appear in logs and metadata.

See Also

Data Sources

Configure external data sources

Access Control

Grant and revoke access to data

Build docs developers (and LLMs) love