Skip to main content

Overview

Datasources are the foundation of your Tinybird data platform. This starter kit uses 6 datasources:
  • 1 landing datasource for raw event ingestion
  • 5 materialized view target datasources for pre-aggregated metrics

Landing Datasource

analytics_events

The primary ingestion point for all web analytics events.
export const analyticsEvents = defineDatasource("analytics_events", {
  tokens: [{ token: trackerToken, scope: "APPEND" }],
  description: "Analytics events landing data source",
  schema: {
    timestamp: t.dateTime(),
    session_id: t.string().nullable(),
    action: t.string().lowCardinality(),
    version: t.string().lowCardinality(),
    payload: t.string(),
    tenant_id: t.string().default(""),
    domain: t.string().default(""),
  },
  engine: engine.mergeTree({
    partitionKey: "toYYYYMM(timestamp)",
    sortingKey: ["tenant_id", "domain", "timestamp"],
  }),
});

Schema Fields

timestamp
DateTime
required
Event timestamp - when the analytics event occurred
session_id
String
Unique session identifier for grouping user activity
action
LowCardinality(String)
required
Event action type (e.g., “page_hit”, “click”, “custom_event”)
Low cardinality optimization reduces storage for fields with limited distinct values
version
LowCardinality(String)
required
Tracker version for schema evolution support
payload
String
required
JSON-encoded event data containing:
  • domain: Website domain
  • locale: User locale
  • location: Geographic location
  • referrer: Traffic source URL
  • pathname: Page path
  • href: Full URL
  • user-agent: Browser user agent string
tenant_id
String
default:""
Multi-tenancy identifier for data isolation
domain
String
default:""
Website domain (can also be extracted from payload as fallback)

Engine Configuration

partitionKey
string
toYYYYMM(timestamp) - Monthly partitioning for efficient time-based queries
sortingKey
array
["tenant_id", "domain", "timestamp"] - Optimizes filtering and range scans
Performance: The sorting key is ordered by query filter frequency - most queries filter by tenant_id and domain first.

Materialized View Datasources

analytics_pages_mv

Aggregates page-level metrics for fast page analytics queries.
export const analyticsPagesMv = defineDatasource("analytics_pages_mv", {
  jsonPaths: false,
  schema: {
    date: t.date(),
    tenant_id: t.string(),
    domain: t.string(),
    device: t.string(),
    browser: t.string(),
    location: t.string(),
    pathname: t.string(),
    visits: t.aggregateFunction("uniq", t.string()),
    hits: t.aggregateFunction("count", t.uint64()),
  },
  engine: engine.aggregatingMergeTree({
    partitionKey: "toYYYYMM(date)",
    sortingKey: [
      "tenant_id",
      "domain",
      "date",
      "device",
      "browser",
      "location",
      "pathname",
    ],
  }),
});

Aggregate Functions

visits
AggregateFunction(uniq, String)
Unique session count using ClickHouse’s uniqState functionMerging: Use uniqMerge(visits) to combine aggregates
hits
AggregateFunction(count, UInt64)
Total page view count using ClickHouse’s countState functionMerging: Use countMerge(hits) to combine aggregates
AggregatingMergeTree automatically merges aggregate function states during background merges, enabling efficient incremental aggregation.

analytics_sessions_mv

Tracks session-level metrics including duration and hit counts.
export const analyticsSessionsMv = defineDatasource("analytics_sessions_mv", {
  jsonPaths: false,
  schema: {
    date: t.date(),
    session_id: t.string(),
    tenant_id: t.string(),
    domain: t.string(),
    device: t.simpleAggregateFunction("any", t.string()),
    browser: t.simpleAggregateFunction("any", t.string()),
    location: t.simpleAggregateFunction("any", t.string()),
    first_hit: t.simpleAggregateFunction("min", t.dateTime()),
    latest_hit: t.simpleAggregateFunction("max", t.dateTime()),
    hits: t.aggregateFunction("count", t.uint64()),
  },
  engine: engine.aggregatingMergeTree({
    partitionKey: "toYYYYMM(date)",
    sortingKey: ["tenant_id", "domain", "date", "session_id"],
  }),
});

Simple vs Regular Aggregates

Simple Aggregate Functions (any, min, max):
  • Lower overhead
  • Don’t require -Merge suffix for querying
  • Use anySimpleState, minSimpleState, maxSimpleState
Regular Aggregate Functions (uniq, count, sum):
  • More complex state
  • Require -Merge suffix: uniqMerge(), countMerge()
  • More flexible for complex aggregations
first_hit
SimpleAggregateFunction(min, DateTime)
Timestamp of first page view in session
latest_hit
SimpleAggregateFunction(max, DateTime)
Timestamp of most recent page view in session
Session duration = latest_hit - first_hit

analytics_sources_mv

Aggregates traffic source and referrer metrics.
export const analyticsSourcesMv = defineDatasource("analytics_sources_mv", {
  jsonPaths: false,
  schema: {
    date: t.date(),
    tenant_id: t.string(),
    domain: t.string(),
    device: t.string(),
    browser: t.string(),
    location: t.string(),
    referrer: t.string(),
    visits: t.aggregateFunction("uniq", t.string()),
    hits: t.aggregateFunction("count", t.uint64()),
  },
  engine: engine.aggregatingMergeTree({
    partitionKey: "toYYYYMM(date)",
    sortingKey: [
      "tenant_id",
      "domain",
      "date",
      "device",
      "browser",
      "location",
      "referrer",
    ],
  }),
});
referrer
String
Traffic source URL - external sites that referred visitors
The materialization filters out same-domain referrers to track only external traffic sources

tenant_actions_mv

Tracks all distinct action types per tenant and domain.
export const tenantActionsMv = defineDatasource("tenant_actions_mv", {
  description: "Materialized datasource for storing distinct actions by tenant and domain",
  jsonPaths: false,
  schema: {
    tenant_id: t.string(),
    domain: t.string(),
    action: t.string(),
    last_payload: t.simpleAggregateFunction("any", t.string()),
    last_seen: t.simpleAggregateFunction("max", t.dateTime()),
    total_occurrences: t.aggregateFunction("count", t.uint64()),
  },
  engine: engine.aggregatingMergeTree({
    partitionKey: "toYYYYMM(last_seen)",
    sortingKey: ["tenant_id", "domain", "action"],
  }),
});
Use Case: This datasource powers action discovery and custom event tracking across your analytics platform.
last_payload
SimpleAggregateFunction(any, String)
Sample payload from most recent occurrence of this action
total_occurrences
AggregateFunction(count, UInt64)
Total number of times this action has occurred

tenant_domains_mv

Tracks active domains for each tenant with activity timestamps.
export const tenantDomainsMv = defineDatasource("tenant_domains_mv", {
  description: "Materialized datasource for tracking domains per tenant",
  jsonPaths: false,
  schema: {
    tenant_id: t.string(),
    domain: t.string(),
    first_seen: t.simpleAggregateFunction("min", t.dateTime()),
    last_seen: t.simpleAggregateFunction("max", t.dateTime()),
    total_hits: t.aggregateFunction("count", t.uint64()),
  },
  engine: engine.aggregatingMergeTree({
    partitionKey: "toYYYYMM(last_seen)",
    sortingKey: ["tenant_id", "domain"],
  }),
});
Use Case: Enables tenant management features like domain listing, activity tracking, and multi-domain analytics.

Type Inference

The SDK provides TypeScript type inference for all datasources:
Type Exports
export type AnalyticsEventsRow = InferRow<typeof analyticsEvents>;
Usage:
Using Types
import { type AnalyticsEventsRow } from "./datasources";

const event: AnalyticsEventsRow = {
  timestamp: new Date(),
  session_id: "sess_123",
  action: "page_hit",
  version: "1.0",
  payload: JSON.stringify({ pathname: "/home" }),
  tenant_id: "tenant_1",
  domain: "example.com",
};

Best Practices

Partition Pruning

Always include date filters in queries to leverage partitioning:
WHERE date >= '2024-01-01' AND date <= '2024-01-31'

Sorting Key Order

Filter by sorting key columns in order for best performance:
WHERE tenant_id = 'x' AND domain = 'y' AND date >= ...

Low Cardinality

Use lowCardinality() for fields with less than 10,000 distinct values to reduce storage

JSON Payload

Keep the payload field flexible for schema evolution without datasource migrations
Important: Changing datasource schemas requires data migrations. Design schemas carefully and use the JSON payload field for flexible attributes.

Next Steps

Pipes

Learn how pipes transform raw data

Materialized Views

Understand how MVs populate these datasources

Architecture

See how datasources fit in the overall architecture

Build docs developers (and LLMs) love