Skip to main content

Overview

Databuddy uses two primary databases:
  • PostgreSQL - User accounts, organizations, websites, API keys
  • ClickHouse - Analytics events, metrics, and time-series data
  • Redis - Caching and session management
This guide covers setting up and initializing all database systems.

PostgreSQL Setup

PostgreSQL stores all relational data for Databuddy.

Installation

The included docker-compose.yaml provides PostgreSQL 17:
services:
  postgres:
    image: postgres:17
    container_name: databuddy-postgres
    environment:
      POSTGRES_DB: databuddy
      POSTGRES_USER: databuddy
      POSTGRES_PASSWORD: databuddy_dev_password
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
Start PostgreSQL:
docker-compose up -d postgres

Database Connection

Configure the PostgreSQL connection in .env:
DATABASE_URL="postgres://databuddy:password@localhost:5432/databuddy"
Format: postgres://username:password@host:port/database

Running Migrations

Databuddy uses Drizzle ORM for database migrations.
1

Generate migration files

After modifying the schema:
bun run db:migrate
This creates SQL migration files in packages/db/migrations/.
2

Apply migrations to database

Push migrations to your database:
bun run db:push
For production:
bun run db:deploy
3

Verify schema

Check that tables were created:
psql $DATABASE_URL -c "\dt"

Database Schema Overview

PostgreSQL stores the following tables: User Management:
  • user - User accounts
  • account - OAuth provider accounts
  • session - User sessions
  • verification - Email verification tokens
Organizations:
  • organization - Organization/team records
  • organization_member - Team membership
  • organization_invite - Pending invites
Analytics Configuration:
  • website - Tracked websites/apps
  • api_key - API authentication keys
  • funnel - Conversion funnels
  • goal - Goal tracking configuration
Billing:
  • subscription - Payment subscriptions
  • usage_record - Usage tracking
View the complete schema in packages/db/src/drizzle/schema.ts.

Database Backup

Backup your PostgreSQL database regularly:
# Create backup
pg_dump $DATABASE_URL > databuddy-backup-$(date +%Y%m%d).sql

# Restore from backup
psql $DATABASE_URL < databuddy-backup-20260301.sql
For automated backups:
# Add to crontab (daily at 2 AM)
0 2 * * * pg_dump postgres://user:pass@localhost:5432/databuddy | gzip > /backups/databuddy-$(date +\%Y\%m\%d).sql.gz

ClickHouse Setup

ClickHouse stores all analytics events and time-series metrics.

Installation

The included docker-compose.yaml provides ClickHouse 25.5.1:
services:
  clickhouse:
    image: clickhouse/clickhouse-server:25.5.1-alpine
    container_name: databuddy-clickhouse
    environment:
      CLICKHOUSE_DB: databuddy_analytics
      CLICKHOUSE_USER: default
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    ports:
      - "8123:8123"  # HTTP interface
      - "9000:9000"  # Native protocol
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
Start ClickHouse:
docker-compose up -d clickhouse

ClickHouse Connection

Configure the ClickHouse connection in .env:
CLICKHOUSE_URL="http://default:@localhost:8123/databuddy_analytics"
Format: http://username:password@host:port/database
If you set a ClickHouse password, include it in the URL: http://default:password@localhost:8123/database

Initialize ClickHouse Schema

Databuddy provides a script to initialize all ClickHouse tables:
bun run clickhouse:init
This creates:
  • Three databases: analytics, uptime, observability
  • All required tables and materialized views
You can also initialize manually using the ClickHouse client:
clickhouse-client --query "$(cat packages/db/src/clickhouse/schema.ts)"

ClickHouse Schema Overview

ClickHouse stores the following tables: Analytics Database:
Primary table for pageviews, custom events, and user interactions.Key fields:
  • client_id - Website/app identifier
  • anonymous_id - User identifier
  • session_id - Session tracking
  • event_name - Event type (screen_view, click, etc.)
  • url, path - Page information
  • country, city - Geo data
  • browser_name, os_name, device_type - Device info
Engine: MergeTree partitioned by month
JavaScript errors and exceptions.Key fields:
  • message - Error message
  • filename, lineno, colno - Error location
  • stack - Stack trace
  • error_type - Error classification
Engine: MergeTree with 90-day TTL
Core Web Vitals and performance data.Metrics tracked:
  • LCP (Largest Contentful Paint)
  • FCP (First Contentful Paint)
  • CLS (Cumulative Layout Shift)
  • INP (Interaction to Next Paint)
  • TTFB (Time to First Byte)
  • FPS (Frames Per Second)
Engine: MergeTree
Server-side events sent via API.Key fields:
  • owner_id - Organization ID
  • event_name - Custom event name
  • namespace - Optional grouping
  • properties - JSON event data
Engine: MergeTree
Revenue tracking from Stripe/Paddle.Key fields:
  • transaction_id - Unique transaction
  • provider - Payment processor
  • amount - Transaction amount
  • currency - Currency code
  • anonymous_id - Attribution to user
Engine: ReplacingMergeTree (handles refunds)
AI crawlers and assistant traffic.Key fields:
  • bot_type - ai_crawler or ai_assistant
  • bot_name - Detected bot (GPTBot, Claude-Web, etc.)
  • path - Page accessed
Engine: MergeTree
Blocked requests from bots and bad actors.Key fields:
  • block_reason - Why blocked
  • block_category - Blocking rule
  • bot_name - Detected bot if applicable
  • ip, user_agent - Request details
Engine: MergeTree with 6-month TTL
Uptime Database:
  • uptime_monitor - Website uptime monitoring data
Observability Database:
  • ai_call_spans - LLM API call tracking (tokens, cost, latency)

Materialized Views

Databuddy uses materialized views for pre-aggregated data:
  • error_hourly_mverror_hourly - Hourly error aggregates
  • web_vitals_hourly_mvweb_vitals_hourly - Hourly performance stats
  • daily_pageviews_mvdaily_pageviews - Daily pageview counts
These views automatically populate as new data arrives.

ClickHouse Optimization

For production deployments:
1

Set appropriate TTL policies

Data is automatically cleaned up based on TTL settings:
-- Example: Shorten error retention to 30 days
ALTER TABLE analytics.error_spans 
MODIFY TTL toDateTime(timestamp) + INTERVAL 30 DAY;
2

Configure compression

ClickHouse uses ZSTD compression by default. Monitor compression ratios:
SELECT 
  table,
  formatReadableSize(sum(data_compressed_bytes)) AS compressed,
  formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
  round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, 2) AS ratio
FROM system.parts
WHERE database = 'analytics'
GROUP BY table;
3

Optimize queries

ClickHouse performs best with:
  • Filters on client_id (primary key)
  • Date range filters aligned to partitions
  • Aggregations instead of individual row queries
Use EXPLAIN to analyze query performance:
EXPLAIN SELECT count() FROM analytics.events 
WHERE client_id = 'xxx' AND time >= today() - 7;

ClickHouse Backup

Backup ClickHouse data:
# Backup all databases to directory
clickhouse-client --query "BACKUP DATABASE analytics TO Disk('backups', 'analytics-backup.zip')"

# Restore from backup
clickhouse-client --query "RESTORE DATABASE analytics FROM Disk('backups', 'analytics-backup.zip')"
For Docker:
# Create volume backup
docker run --rm -v databuddy_clickhouse_data:/data -v $(pwd):/backup alpine tar czf /backup/clickhouse-backup.tar.gz /data

# Restore volume
docker run --rm -v databuddy_clickhouse_data:/data -v $(pwd):/backup alpine tar xzf /backup/clickhouse-backup.tar.gz -C /

Redis Setup

Redis provides caching and session storage.

Installation

The included docker-compose.yaml provides Redis 7:
services:
  redis:
    image: redis:7-alpine
    container_name: databuddy-redis
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --maxmemory 512mb
Start Redis:
docker-compose up -d redis

Redis Connection

REDIS_URL="redis://localhost:6379"
For authenticated Redis:
REDIS_URL="redis://:password@localhost:6379"

Redis Configuration

Databuddy uses Redis for:
  • Session storage
  • Query result caching (TTL: 5 minutes)
  • Rate limiting
  • Background job queues
No manual setup required - Redis is used automatically when configured.

Database Monitoring

PostgreSQL Monitoring

-- Check database size
SELECT pg_size_pretty(pg_database_size('databuddy'));

-- Active connections
SELECT count(*) FROM pg_stat_activity;

-- Slow queries
SELECT query, calls, total_time, mean_time 
FROM pg_stat_statements 
ORDER BY mean_time DESC LIMIT 10;

ClickHouse Monitoring

-- Check table sizes
SELECT 
  table,
  formatReadableSize(sum(bytes)) AS size,
  sum(rows) AS rows
FROM system.parts
WHERE database = 'analytics' AND active
GROUP BY table;

-- Query performance
SELECT 
  query,
  count() as executions,
  avg(query_duration_ms) as avg_duration_ms
FROM system.query_log
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY query
ORDER BY avg_duration_ms DESC
LIMIT 10;

Troubleshooting

Check connection string and PostgreSQL service:
# Test connection
psql $DATABASE_URL -c "SELECT version();"

# Check service status
sudo systemctl status postgresql
Verify ClickHouse is running and accessible:
# Test HTTP interface
curl http://localhost:8123/ping

# Check databases
clickhouse-client --query "SHOW DATABASES"

# Re-run initialization
bun run clickhouse:init
Check for schema conflicts:
# View pending migrations
bun run db:studio

# Force reset (CAUTION: deletes data)
psql $DATABASE_URL -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
bun run db:push
Increase memory limits or adjust query settings:
SET max_memory_usage = 10000000000; -- 10 GB
In docker-compose.yaml:
deploy:
  resources:
    limits:
      memory: 8G

Next Steps

Environment Variables

Configure authentication, integrations, and features

Configuration

Advanced configuration and tuning

Build docs developers (and LLMs) love