Fully managed columnar database for real-time analytical data warehousing and OLAP workloads with advanced SQL query capabilities.
Aiven for ClickHouse is a fully managed distributed columnar database based on open-source ClickHouse. Purpose-built for online analytical processing (OLAP), it delivers blazing-fast SQL queries on large datasets for real-time analytical reporting and data warehousing.
ClickHouse is a highly scalable, fault-tolerant database designed specifically for analytical workloads. Its columnar storage format and advanced compression enable complex SQL queries on billions of rows with sub-second response times.
CREATE TABLE analytics( date Date, hour UInt8, user_id UInt64, metric Float64)ENGINE = MergeTree()PARTITION BY toYYYYMM(date)ORDER BY (date, hour, user_id)SETTINGS index_granularity = 8192;
Features:
Primary key indexing
Data partitioning
Data sampling
Data TTL
MergeTree with replication:
CREATE TABLE replicated_events ON CLUSTER 'default'( timestamp DateTime, event String)ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')PARTITION BY toYYYYMM(timestamp)ORDER BY timestamp;
Features:
Automatic data replication
Fault tolerance
Distributed queries
Pre-aggregate data on merge:
CREATE TABLE daily_stats( date Date, metric_name String, count AggregateFunction(count), sum AggregateFunction(sum, Float64), avg AggregateFunction(avg, Float64))ENGINE = AggregatingMergeTree()PARTITION BY toYYYYMM(date)ORDER BY (date, metric_name);-- Insert using -State functionsINSERT INTO daily_statsSELECT toDate(timestamp) AS date, 'page_views' AS metric_name, countState() AS count, sumState(duration) AS sum, avgState(duration) AS avgFROM page_viewsGROUP BY date;-- Query using -Merge functionsSELECT date, countMerge(count) AS total_count, sumMerge(sum) AS total_sum, avgMerge(avg) AS averageFROM daily_statsGROUP BY date;
-- Create dictionary from external sourceCREATE DICTIONARY user_info( user_id UInt64, name String, country String, created_at DateTime)PRIMARY KEY user_idSOURCE(POSTGRESQL( host 'postgres-host' port 5432 user 'readonly' password 'secret' db 'users' table 'user_profiles'))LAYOUT(HASHED())LIFETIME(3600);-- Use dictionary in queriesSELECT e.user_id, dictGet('user_info', 'name', e.user_id) AS user_name, dictGet('user_info', 'country', e.user_id) AS country, count() AS eventsFROM events eGROUP BY e.user_id;
SELECT date, revenue, -- Running total sum(revenue) OVER (ORDER BY date) AS cumulative_revenue, -- Moving average avg(revenue) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS ma_7day, -- Rank row_number() OVER (ORDER BY revenue DESC) AS rank, -- Percent change (revenue - lag(revenue) OVER (ORDER BY date)) / lag(revenue) OVER (ORDER BY date) * 100 AS pct_changeFROM daily_revenueORDER BY date;
-- Partition by monthCREATE TABLE events( timestamp DateTime, data String)ENGINE = MergeTree()PARTITION BY toYYYYMM(timestamp)ORDER BY timestamp;-- Query specific partitionSELECT * FROM eventsWHERE timestamp >= '2024-03-01' AND timestamp < '2024-04-01';-- Drop old partitionsALTER TABLE events DROP PARTITION '202401';
Indexing
Multiple index types available:
CREATE TABLE search_logs( timestamp DateTime, query String, user_id UInt64, INDEX query_idx query TYPE tokenbf_v1(10000, 3, 0) GRANULARITY 4, INDEX user_idx user_id TYPE minmax GRANULARITY 1)ENGINE = MergeTree()ORDER BY timestamp;
Query Optimization
Best practices:
Use PREWHERE for heavy filtering
Minimize SELECT *
Use appropriate data types
Leverage primary key ordering
Use sampling for exploratory queries
-- Use PREWHERE for filteringSELECT user_id, count()FROM eventsPREWHERE date = today()WHERE event_type = 'purchase'GROUP BY user_id;-- Sample data for fast resultsSELECT avg(revenue)FROM events SAMPLE 0.1WHERE date >= today() - 30;
Performance Tip: ClickHouse can process billions of rows per second. Design your schemas and queries to take advantage of columnar storage and parallel processing.