System Architecture
The Tinybird Web Analytics Starter Kit uses a Lambda Architecture pattern optimized for real-time analytics. The system processes web analytics events through three main layers:Data Flow
Core Components
1. Landing Datasource
analytics_events serves as the primary ingestion point for all web analytics events:- Accepts raw event data with minimal processing
- Partitioned by month:
toYYYYMM(timestamp) - Sorted by:
tenant_id,domain,timestamp - Optimized for high-throughput writes
2. Processing Layer
The analytics_hits pipe transforms raw events:- Parses JSON payloads to extract structured fields
- Detects device types (desktop, mobile-android, mobile-ios, bot)
- Identifies browsers (chrome, firefox, safari, opera, ie)
- Handles domain resolution with fallback logic
- Filters out bot traffic
3. Materialization Layer
Five materialized views pre-aggregate data for fast queries:Pages
Aggregates page-level metrics by date, tenant, domain, device, browser, location, and pathname
Sessions
Tracks session metrics including first/last hit times and total hits per session
Sources
Aggregates traffic sources and referrers with visit and hit counts
Tenant Actions
Maintains distinct action types per tenant/domain with occurrence counts
Tenant Domains
Tracks active domains per tenant with first/last seen timestamps
4. API Layer
Endpoints serve pre-aggregated data from materialized views and real-time data from the processing layer:- Real-time endpoints: Query
analytics_hitsdirectly for live data - Historical endpoints: Query materialized views for aggregated metrics
- Hybrid endpoints: Combine both for flexible date ranges
Multi-tenancy Design
The architecture supports multi-tenancy through:- Efficient data isolation via
tenant_idfiltering - Multi-domain support within each tenant
- Optimized query performance through proper sorting
Performance Optimizations
Partitioning Strategy
All datasources use monthly partitioning:- Fast data pruning for date-range queries
- Efficient data lifecycle management
- Optimal storage compression
Aggregate Functions
Materialized views use ClickHouse aggregate functions:| Function Type | Use Case | Example |
|---|---|---|
aggregateFunction | Mergeable aggregates | uniq, count |
simpleAggregateFunction | Simple aggregates | any, min, max |
Performance Tip: Materialized views enable sub-second queries on historical data by pre-computing aggregations at write time.
Sorting Keys
Carefully chosen sorting keys optimize common query patterns:Data Retention
Recommended retention strategy:- Landing datasource (analytics_events): 30-90 days
- Materialized views: 1-2 years
- Aggregate rollups: Indefinite
Scalability Considerations
Write Throughput
The architecture handles high write volumes through:- Minimal landing transformation: Raw events written directly
- Async materialization: Background aggregation doesn’t block writes
- Efficient engines: MergeTree and AggregatingMergeTree optimize for writes
Query Performance
- Pre-aggregated data: Materialized views serve most queries
- Indexed sorting keys: Fast filtering on common dimensions
- Partition pruning: Date filters leverage partitioning
Resource Management
Next Steps
Datasources
Explore detailed schema definitions
Pipes
Learn about data transformation logic
Materialized Views
Understand aggregation strategies