ClickHouse

Overview

ClickHouse is a column-oriented database for real-time analytical queries on large datasets. It provides blazing-fast aggregations and can handle billions of rows. Zequel supports ClickHouse through the official @clickhouse/client driver (v1.16.0).

Supported Versions

ClickHouse 21.x
ClickHouse 22.x
ClickHouse 23.x
ClickHouse 24.x

Connection

HTTP/HTTPS Connection

ClickHouse uses HTTP interface for client connections:

Host: localhost
Port: 8123 (HTTP) or 8443 (HTTPS)
Database: default
Username: default
Password: (optional)

Connection URL

Zequel constructs URLs automatically:

http://default:password@localhost:8123
https://default:password@localhost:8443

SSL/TLS Configuration

ClickHouse supports HTTPS connections:

{
  host: 'localhost',
  port: 8443,
  ssl: true,
  sslConfig: {
    rejectUnauthorized: true
  }
}

Features

Databases & Tables

Create and drop databases
Create, rename, and drop tables
Table comments and metadata
View table statistics (rows, bytes, compression)

Table Engines

ClickHouse requires specifying a table engine:

MergeTree Family (Most Common)

CREATE TABLE events (
  id UInt64,
  user_id UInt32,
  event_time DateTime,
  event_type String
)
ENGINE = MergeTree()
ORDER BY (user_id, event_time)
PRIMARY KEY (user_id);

MergeTree: General-purpose engine
ReplacingMergeTree: Deduplication on merge
SummingMergeTree: Auto-sum numeric columns
AggregatingMergeTree: Pre-aggregation
CollapsingMergeTree: Row cancellation

Other Engines

Log: Simple append-only tables
Memory: In-memory storage
Distributed: Sharded across cluster
Kafka: Read from Kafka topics
File: Query external files

Data Types

ClickHouse provides specialized analytical types:

Int8, Int16, Int32, Int64, Int128, Int256
UInt8, UInt16, UInt32, UInt64, UInt128, UInt256

Nullable Types

ClickHouse requires explicit Nullable wrapper:

CREATE TABLE users (
  id UInt64,
  name String,
  email Nullable(String),  -- Can be NULL
  age UInt8                 -- Cannot be NULL
)
ENGINE = MergeTree()
ORDER BY id;

Indexes

Primary Key / ORDER BY

The ORDER BY clause determines data layout:

CREATE TABLE logs (
  timestamp DateTime,
  user_id UInt32,
  message String
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp)     -- Data sorted by these columns
PRIMARY KEY (user_id);              -- Sparse index on user_id

Data-Skipping Indexes

Secondary indexes for query optimization:

-- Create table with data-skipping index
CREATE TABLE events (
  event_time DateTime,
  user_id UInt32,
  event_type String,
  INDEX idx_event_type event_type TYPE bloom_filter GRANULARITY 4
)
ENGINE = MergeTree()
ORDER BY event_time;

-- Add index to existing table
ALTER TABLE events ADD INDEX idx_user user_id TYPE minmax GRANULARITY 4;

Index types:

minmax: Min/max values per granule
set: Unique values per granule
bloom_filter: Bloom filter for string matching
tokenbf_v1: Token bloom filter for text search
ngrambf_v1: N-gram bloom filter

Partitions

Partition tables by date, month, or custom expression:

CREATE TABLE events (
  event_date Date,
  user_id UInt32,
  event_type String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)  -- Monthly partitions
ORDER BY (user_id, event_date);

-- Drop old partitions
ALTER TABLE events DROP PARTITION '202301';

View partitions:

SELECT 
  partition,
  name,
  rows,
  bytes_on_disk
FROM system.parts
WHERE table = 'events' AND active;

Materialized Views

Pre-compute aggregations for fast queries:

-- Create target table for aggregated data
CREATE TABLE daily_stats (
  event_date Date,
  user_id UInt32,
  event_count UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (event_date, user_id);

-- Create materialized view that populates target
CREATE MATERIALIZED VIEW daily_stats_mv TO daily_stats
AS SELECT
  toDate(event_time) as event_date,
  user_id,
  count() as event_count
FROM events
GROUP BY event_date, user_id;

ClickHouse-Specific Features

ALTER TABLE Operations

ClickHouse uses ALTER TABLE for data modifications:

-- Update rows (asynchronous)
ALTER TABLE events UPDATE event_type = 'click' WHERE event_type = 'view';

-- Delete rows (asynchronous)
ALTER TABLE events DELETE WHERE event_time < '2024-01-01';

UPDATE and DELETE in ClickHouse are asynchronous and may take time to complete. Use lightweight DELETE for immediate effect (ClickHouse 23.3+).

User-Defined Functions

Create custom functions:

CREATE FUNCTION add_one AS (x) -> x + 1;

SELECT add_one(5); -- Returns 6

Distributed Tables

Query data across multiple nodes:

CREATE TABLE events_distributed AS events
ENGINE = Distributed(cluster_name, database, events, rand());

Array Functions

Powerful array manipulation:

SELECT 
  arrayMap(x -> x * 2, [1, 2, 3]) as doubled,
  arrayFilter(x -> x > 2, [1, 2, 3, 4]) as filtered,
  arrayReduce('sum', [1, 2, 3, 4]) as total;

Query Optimization

Best Practices

Use the right ORDER BY

-- Queries filtered by user_id and date are fast
ORDER BY (user_id, event_date)

Leverage partitions

-- Only scans relevant partitions
SELECT * FROM events WHERE event_date >= '2024-01-01';

Use LowCardinality for enum-like columns
```
event_type LowCardinality(String)
```

**Avoid SELECT ***

-- Faster - only reads needed columns
SELECT user_id, count() FROM events GROUP BY user_id;

Use PREWHERE for filtering

-- Filter before reading other columns
SELECT * FROM events PREWHERE event_date = today() WHERE event_type = 'click';

Query Profiling

-- Analyze query execution
EXPLAIN SELECT count() FROM events WHERE user_id = 123;

-- View query statistics
SELECT * FROM system.query_log WHERE type = 'QueryFinish' ORDER BY event_time DESC LIMIT 10;

Limitations

ClickHouse is designed for analytics, not transactions:

No foreign key constraints: ClickHouse doesn’t enforce referential integrity
No triggers: Not supported
No stored procedures: Use user-defined functions instead
Limited UPDATE/DELETE: Asynchronous and expensive
No multi-row transactions: Each INSERT is independent
No ALTER COLUMN type: Must recreate table

Performance Features

Parallel Query Execution

Automatic parallelization across CPU cores:

SET max_threads = 8;  -- Use 8 threads for queries

Compression

Automatic data compression (typically 10-30x):

-- View compression ratio
SELECT 
  table,
  formatReadableSize(sum(data_compressed_bytes)) as compressed,
  formatReadableSize(sum(data_uncompressed_bytes)) as uncompressed,
  round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) as ratio
FROM system.parts
WHERE active
GROUP BY table;

Query Cancellation

Zequel can cancel running queries using AbortController.

Docker Development

Zequel includes ClickHouse in Docker Compose:

services:
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ./docker/clickhouse/init.sql:/docker-entrypoint-initdb.d/init.sql

The seed database includes:

Tables with MergeTree engine
Partitioned tables
Materialized views
Data-skipping indexes
Sample analytical data

Use Cases

Ideal For:

✅ Real-time analytics dashboards ✅ Log aggregation and analysis ✅ Time-series data ✅ Event tracking and metrics ✅ Data warehousing ✅ High-throughput data ingestion ✅ Columnar data analysis

Not Ideal For:

❌ OLTP (transactional workloads) ❌ Frequent updates/deletes ❌ Small datasets (< 1M rows) ❌ Point queries on individual records ❌ Strong consistency requirements

Best Practices

Choose appropriate table engine (usually MergeTree)
Design ORDER BY based on query patterns
Use partitions for time-series data
Leverage materialized views for common aggregations
Use LowCardinality for low-cardinality strings
Avoid Nullable types when possible (slower and use more space)
Batch inserts for better performance
Monitor system.query_log for slow queries

Supported Databases

Overview

Supported Versions

Connection

HTTP/HTTPS Connection

Connection URL

SSL/TLS Configuration

Features

Databases & Tables

Table Engines

MergeTree Family (Most Common)

Other Engines

Data Types

Nullable Types

Indexes

Primary Key / ORDER BY

Data-Skipping Indexes

Partitions

Materialized Views

ClickHouse-Specific Features

ALTER TABLE Operations

User-Defined Functions

Distributed Tables

Array Functions

Query Optimization

Best Practices

Query Profiling

Limitations

Performance Features

Parallel Query Execution

Compression

Query Cancellation

Docker Development

Use Cases

Ideal For:

Not Ideal For:

Best Practices

Additional Resources

Build docs developers (and LLMs) love

Supported Databases

​Overview

​Supported Versions

​Connection

​HTTP/HTTPS Connection

​Connection URL

​SSL/TLS Configuration

​Features

​Databases & Tables

​Table Engines

​MergeTree Family (Most Common)

​Other Engines

​Data Types

​Nullable Types

​Indexes

​Primary Key / ORDER BY

​Data-Skipping Indexes

​Partitions

​Materialized Views

​ClickHouse-Specific Features

​ALTER TABLE Operations

​User-Defined Functions

​Distributed Tables

​Array Functions

​Query Optimization

​Best Practices

​Query Profiling

​Limitations

​Performance Features

​Parallel Query Execution

​Compression

​Query Cancellation

​Docker Development

​Use Cases

​Ideal For:

​Not Ideal For:

​Best Practices

​Additional Resources

Build docs developers (and LLMs) love

Overview

Supported Versions

Connection

HTTP/HTTPS Connection

Connection URL

SSL/TLS Configuration

Features

Databases & Tables

Table Engines

MergeTree Family (Most Common)

Other Engines

Data Types

Nullable Types

Indexes

Primary Key / ORDER BY

Data-Skipping Indexes

Partitions

Materialized Views

ClickHouse-Specific Features

ALTER TABLE Operations

User-Defined Functions

Distributed Tables

Array Functions

Query Optimization

Best Practices

Query Profiling

Limitations

Performance Features

Parallel Query Execution

Compression

Query Cancellation

Docker Development

Use Cases

Ideal For:

Not Ideal For:

Best Practices

Additional Resources