Skip to main content
Engine configurations determine how data is stored, merged, and queried in ClickHouse. The SDK provides type-safe builders for all major MergeTree engine variants.

Basic Usage

import { engine } from '@tinybirdco/sdk';

engine.mergeTree({
  sortingKey: ['user_id', 'timestamp'],
  partitionKey: 'toYYYYMM(timestamp)',
  ttl: 'timestamp + INTERVAL 90 DAY',
});

Common Configuration Options

All MergeTree engines share these base configuration options:
sortingKey
string | string[]
required
Columns used for sorting data within parts. This is required for all MergeTree engines.
sortingKey: ['user_id', 'timestamp']
partitionKey
string
Expression for partitioning data (e.g., 'toYYYYMM(timestamp)'). Partitions enable efficient data management and pruning.
partitionKey: 'toYYYYMM(timestamp)'
primaryKey
string | string[]
Primary key columns. Defaults to sortingKey if not specified.
primaryKey: ['user_id']
ttl
string
TTL expression for automatic data expiration.
ttl: 'timestamp + INTERVAL 90 DAY'
settings
Record<string, string | number | boolean>
Additional engine settings.
settings: {
  index_granularity: 8192,
  min_bytes_for_wide_part: 0,
}

MergeTree

engine.mergeTree()

The most universal and functional table engine for high-load tasks.Best for: General-purpose analytics, logs, events
import { engine } from '@tinybirdco/sdk';

engine.mergeTree({
  sortingKey: ['user_id', 'timestamp'],
  partitionKey: 'toYYYYMM(timestamp)',
  ttl: 'timestamp + INTERVAL 90 DAY',
});

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const events = defineDatasource('events', {
  schema: {
    timestamp: t.dateTime(),
    user_id: t.string(),
    event_name: t.string(),
    properties: t.string(),
  },
  engine: engine.mergeTree({
    sortingKey: ['event_name', 'timestamp'],
    partitionKey: 'toYYYYMM(timestamp)',
    ttl: 'timestamp + INTERVAL 90 DAY',
  }),
});

ReplacingMergeTree

engine.replacingMergeTree()

Removes duplicate rows with the same sorting key during background merges.Best for: Maintaining latest state, upserts, slowly changing dimensions
engine.replacingMergeTree({
  sortingKey: ['id'],
  ver: 'updated_at', // Optional version column
});
ver
string
Optional version column. Rows with the highest version are kept during merges.
isDeleted
string
Optional flag column for clean mode (ClickHouse 23.2+). Rows with isDeleted = 1 are removed during merges.

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const users = defineDatasource('users', {
  schema: {
    id: t.string(),
    name: t.string(),
    email: t.string(),
    updated_at: t.dateTime(),
  },
  engine: engine.replacingMergeTree({
    sortingKey: ['id'],
    ver: 'updated_at',
  }),
});

SummingMergeTree

engine.summingMergeTree()

Sums numeric columns for rows with the same sorting key during background merges.Best for: Counters, metrics aggregation, pre-aggregated data
engine.summingMergeTree({
  sortingKey: ['date', 'metric_name'],
  columns: ['value'], // Columns to sum
});
columns
string[]
Columns to sum during merges. If not specified, all numeric columns are summed.

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const metrics = defineDatasource('metrics', {
  schema: {
    date: t.date(),
    metric_name: t.string(),
    value: t.int64(),
    count: t.int32(),
  },
  engine: engine.summingMergeTree({
    sortingKey: ['date', 'metric_name'],
    columns: ['value', 'count'],
  }),
});

AggregatingMergeTree

engine.aggregatingMergeTree()

For incremental data aggregation with AggregateFunction columns.Best for: Materialized views, incremental aggregation pipelines
engine.aggregatingMergeTree({
  sortingKey: ['date'],
});

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const dailyStats = defineDatasource('daily_stats', {
  schema: {
    date: t.date(),
    pathname: t.string(),
    views: t.simpleAggregateFunction('sum', t.uint64()),
    unique_sessions: t.aggregateFunction('uniq', t.string()),
  },
  engine: engine.aggregatingMergeTree({
    sortingKey: ['date', 'pathname'],
  }),
});

CollapsingMergeTree

engine.collapsingMergeTree()

For collapsing rows that cancel each other out.Best for: Changelog-style updates, mutable data with deletes
engine.collapsingMergeTree({
  sortingKey: ['id', 'timestamp'],
  sign: 'sign', // Column containing 1 or -1
});
sign
string
required
Column containing the sign (1 for state, -1 for cancel).

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const userActions = defineDatasource('user_actions', {
  schema: {
    id: t.string(),
    timestamp: t.dateTime(),
    action: t.string(),
    sign: t.int8(), // 1 or -1
  },
  engine: engine.collapsingMergeTree({
    sortingKey: ['id', 'timestamp'],
    sign: 'sign',
  }),
});

VersionedCollapsingMergeTree

engine.versionedCollapsingMergeTree()

For collapsing with versioning when events may arrive out of order.Best for: Changelog-style updates with potential out-of-order arrival
engine.versionedCollapsingMergeTree({
  sortingKey: ['id'],
  sign: 'sign',
  version: 'version',
});
sign
string
required
Column containing the sign (1 for state, -1 for cancel).
version
string
required
Column containing the version number for ordering.

Full Example

import { defineDatasource, t, engine } from '@tinybirdco/sdk';

export const changelog = defineDatasource('changelog', {
  schema: {
    id: t.string(),
    timestamp: t.dateTime(),
    data: t.string(),
    sign: t.int8(),
    version: t.uint64(),
  },
  engine: engine.versionedCollapsingMergeTree({
    sortingKey: ['id'],
    sign: 'sign',
    version: 'version',
  }),
});

Choosing the Right Engine

MergeTree

Use for append-only data like logs, events, and analytics.

ReplacingMergeTree

Use when you need to update rows by replacing old versions.

SummingMergeTree

Use for counters and metrics that can be summed.

AggregatingMergeTree

Use for complex aggregations in materialized views.

CollapsingMergeTree

Use for mutable data with changelog semantics.

VersionedCollapsingMergeTree

Use for mutable data with out-of-order updates.

Best Practices

Sorting Key Selection

Choose sorting keys based on your query patterns:
// Good: Most selective columns first
sortingKey: ['user_id', 'timestamp']

// Good: Match your WHERE clause patterns
sortingKey: ['date', 'country', 'event_type']

Partition Key Selection

Partition by time periods for efficient data management:
// Monthly partitions
partitionKey: 'toYYYYMM(timestamp)'

// Daily partitions (for high-volume data)
partitionKey: 'toYYYYMMDD(timestamp)'

// Yearly partitions (for low-volume data)
partitionKey: 'toYear(timestamp)'

TTL for Data Retention

Automatically expire old data:
// Keep 90 days
ttl: 'timestamp + INTERVAL 90 DAY'

// Keep 1 year
ttl: 'timestamp + INTERVAL 1 YEAR'
Changing the engine type requires recreating the datasource. Plan your engine choice carefully based on your use case.

Build docs developers (and LLMs) love