Database Architecture
ClickHouse
Analytics & Time-Series Data
- Events (page views, custom events)
- Web Vitals metrics
- Error tracking
- Revenue transactions
- AI usage logs
- Uptime monitoring
PostgreSQL
Application Metadata
- Users & organizations
- Websites & domains
- API keys & authentication
- Feature flags
- Goals & funnels
- Links & annotations
Why Two Databases?
This separation provides several benefits:- Performance: ClickHouse excels at analytical queries on billions of events
- Scalability: Analytics data can grow independently from metadata
- Cost: ClickHouse compresses data 10-100x, reducing storage costs
- Query Speed: Columnar storage makes aggregations lightning-fast
- Isolation: Critical user data is separated from high-volume analytics
ClickHouse Schema
ClickHouse stores all time-series and event data across three databases:analytics- Main analytics events and aggregationsuptime- Uptime monitoring dataobservability- AI call spans and traces
Events Table
The main events table stores all page views and interactions:The events table is partitioned by month and ordered by client_id and time for optimal query performance.
Web Vitals Tables
Web Vitals use a spans-oriented design for efficient storage:- LCP (Largest Contentful Paint)
- FCP (First Contentful Paint)
- CLS (Cumulative Layout Shift)
- INP (Interaction to Next Paint)
- TTFB (Time to First Byte)
- FPS (Frames Per Second)
Hourly Aggregations
Web Vitals are pre-aggregated hourly for fast dashboard queries:Error Tracking Tables
Errors are stored in a lightweight spans table:Custom Events Table
Custom events tracked via the SDK are stored separately:Revenue Tracking
Revenue transactions from Stripe/Paddle are stored for attribution:ReplacingMergeTree engine handles updates (e.g., refunds) by replacing rows with matching transaction_id.
AI Call Spans
For AI observability, Databuddy tracks LLM API calls:PostgreSQL Schema
PostgreSQL stores application metadata using Drizzle ORM. Key tables include:Users & Organizations
Websites
API Keys
Feature Flags
Data Flow
Compression & Storage
ClickHouse uses aggressive compression:- ZSTD(1): General-purpose compression for strings
- Delta encoding: Efficient for timestamps and sequential numbers
- Gorilla codec: Optimized for floating-point metrics
- LowCardinality: Dictionary encoding for enum-like columns
- Events: 15-20x compression
- Web Vitals: 30-40x compression
- Errors: 10-15x compression
1 billion events typically use 50-100 GB of disk space after compression.
Partitioning Strategy
All ClickHouse tables are partitioned by time:- Monthly partitions for events:
PARTITION BY toYYYYMM(time) - Daily partitions for spans:
PARTITION BY toDate(timestamp)
- Efficient TTL deletion of old data
- Fast queries on recent time ranges
- Parallel query execution across partitions
Indexing
ClickHouse uses specialized indexes:Primary Key (ORDER BY)
Primary Key (ORDER BY)
The primary sorting key determines data layout on disk:Queries filtering by client_id and time are extremely fast.
Bloom Filters
Bloom Filters
Probabilistic indexes for string matching:Used for filtering by session_id, event_name, etc.
MinMax Indexes
MinMax Indexes
Store min/max values for range queries:Useful for filtering numeric ranges (e.g., LCP > 2500).
Token Bloom Filters
Token Bloom Filters
For full-text search in error messages:Enables fast substring searches.
Query Examples
Data Retention
Configure TTL (Time To Live) policies:Best Practices
Use Aggregates
Query hourly/daily aggregates instead of raw spans for date range queries.
Filter Early
Always filter by client_id and time range first:
Limit Results
Always use LIMIT on large tables:
Materialize Views
Create materialized views for frequently-run aggregations to pre-compute results.
Learn More
Privacy First
Understand data anonymization and privacy features
Event Tracking
Learn how events flow through the system
Sessions & Users
Explore user and session identification