Architecture Overview

System Architecture

The Tinybird Web Analytics Starter Kit uses a Lambda Architecture pattern optimized for real-time analytics. The system processes web analytics events through three main layers:

Data Flow

Core Components

1. Landing Datasource

analytics_events serves as the primary ingestion point for all web analytics events:

Accepts raw event data with minimal processing
Partitioned by month: toYYYYMM(timestamp)
Sorted by: tenant_id, domain, timestamp
Optimized for high-throughput writes

2. Processing Layer

The analytics_hits pipe transforms raw events:

Parses JSON payloads to extract structured fields
Detects device types (desktop, mobile-android, mobile-ios, bot)
Identifies browsers (chrome, firefox, safari, opera, ie)
Handles domain resolution with fallback logic
Filters out bot traffic

3. Materialization Layer

Five materialized views pre-aggregate data for fast queries:

Pages

Aggregates page-level metrics by date, tenant, domain, device, browser, location, and pathname

Sessions

Tracks session metrics including first/last hit times and total hits per session

Sources

Aggregates traffic sources and referrers with visit and hit counts

Tenant Actions

Maintains distinct action types per tenant/domain with occurrence counts

Tenant Domains

Tracks active domains per tenant with first/last seen timestamps

4. API Layer

Endpoints serve pre-aggregated data from materialized views and real-time data from the processing layer:

Real-time endpoints: Query analytics_hits directly for live data
Historical endpoints: Query materialized views for aggregated metrics
Hybrid endpoints: Combine both for flexible date ranges

Multi-tenancy Design

The architecture supports multi-tenancy through:

schema: {
  timestamp: t.dateTime(),
  tenant_id: t.string().default(""),
  domain: t.string().default(""),
  // ... other fields
}

Benefits:

Efficient data isolation via tenant_id filtering
Multi-domain support within each tenant
Optimized query performance through proper sorting

Performance Optimizations

Partitioning Strategy

All datasources use monthly partitioning:

partitionKey: "toYYYYMM(timestamp)"

This enables:

Fast data pruning for date-range queries
Efficient data lifecycle management
Optimal storage compression

Aggregate Functions

Materialized views use ClickHouse aggregate functions:

Function Type	Use Case	Example
`aggregateFunction`	Mergeable aggregates	`uniq`, `count`
`simpleAggregateFunction`	Simple aggregates	`any`, `min`, `max`

Performance Tip: Materialized views enable sub-second queries on historical data by pre-computing aggregations at write time.

Sorting Keys

Carefully chosen sorting keys optimize common query patterns:

sortingKey: [
  "tenant_id",    // Primary filter
  "domain",       // Secondary filter
  "date",         // Time-range queries
  "device",       // Grouping dimension
  "browser",      // Grouping dimension
  "location",     // Grouping dimension
  "pathname",     // High-cardinality dimension last
]

Data Retention

Implement TTL policies based on your retention requirements. Raw events in the landing datasource consume more storage than materialized views.

Recommended retention strategy:

Landing datasource (analytics_events): 30-90 days
Materialized views: 1-2 years
Aggregate rollups: Indefinite

Scalability Considerations

Write Throughput

The architecture handles high write volumes through:

Minimal landing transformation: Raw events written directly
Async materialization: Background aggregation doesn’t block writes
Efficient engines: MergeTree and AggregatingMergeTree optimize for writes

Query Performance

Pre-aggregated data: Materialized views serve most queries
Indexed sorting keys: Fast filtering on common dimensions
Partition pruning: Date filters leverage partitioning

Resource Management

Materialized views consume CPU during materialization. Monitor resource usage and adjust refresh intervals if needed.

Next Steps

Datasources

Explore detailed schema definitions

Pipes

Learn about data transformation logic

Materialized Views

Understand aggregation strategies

Get Started

Tracking

Dashboard

Data Platform

Advanced

System Architecture

Data Flow

Core Components

1. Landing Datasource

2. Processing Layer

3. Materialization Layer

Pages

Sessions

Sources

Tenant Actions

Tenant Domains

4. API Layer

Multi-tenancy Design

Performance Optimizations

Partitioning Strategy

Aggregate Functions

Sorting Keys

Data Retention

Scalability Considerations

Write Throughput

Query Performance

Resource Management

Next Steps

Datasources

Pipes

Materialized Views

Build docs developers (and LLMs) love

Get Started

Tracking

Dashboard

Data Platform

Advanced

​System Architecture

​Data Flow

​Core Components

​1. Landing Datasource

​2. Processing Layer

​3. Materialization Layer

Pages

Sessions

Sources

Tenant Actions

Tenant Domains

​4. API Layer

​Multi-tenancy Design

​Performance Optimizations

​Partitioning Strategy

​Aggregate Functions

​Sorting Keys

​Data Retention

​Scalability Considerations

​Write Throughput

​Query Performance

​Resource Management

​Next Steps

Datasources

Pipes

Materialized Views

Build docs developers (and LLMs) love

System Architecture

Data Flow

Core Components

1. Landing Datasource

2. Processing Layer

3. Materialization Layer

4. API Layer

Multi-tenancy Design

Performance Optimizations

Partitioning Strategy

Aggregate Functions

Sorting Keys

Data Retention

Scalability Considerations

Write Throughput

Query Performance

Resource Management

Next Steps