Database Setup - Open Wearables

Overview

Open Wearables uses PostgreSQL as its primary database and Alembic for schema migrations. This guide covers database setup, migration management, backup strategies, and troubleshooting.

Database Requirements

PostgreSQL Version: 18+ (recommended), 15+ (minimum) Extensions Required:

uuid-ossp - UUID generation functions (auto-enabled)

Recommended Configuration:

Minimum RAM: 2GB
Minimum Storage: 20GB (varies with data volume)
Timezone: UTC (strongly recommended)

Quick Setup

Configure database connection

Set database credentials in backend/config/.env:

.env

DB_HOST=db
DB_PORT=5432
DB_NAME=open-wearables
DB_USER=open-wearables
DB_PASSWORD=open-wearables

See Environment Variables for production configuration.

Start database service

Using Docker Compose:

docker compose up -d db

The database will be available on localhost:5432 with health checks enabled.

Run migrations

Migrations run automatically when the backend starts, or manually:

# Using Makefile
make migrate

# Or directly with Docker
docker compose exec app alembic upgrade head

Verify setup

Check that tables were created:

docker compose exec db psql -U open-wearables -d open-wearables -c "\dt"

You should see tables for users, providers, workouts, sleep data, and more.

Database Schema

The Open Wearables database schema includes:

Core Tables

Table	Description
`users`	User accounts and authentication
`api_keys`	API keys for programmatic access
`invitations`	User invitation management
`providers`	Wearable provider definitions (Garmin, Polar, etc.)
`user_connections`	OAuth connections to providers

Data Tables

Table	Description
`series_type_definitions`	Standardized metric definitions
`workouts`	Exercise activities and sessions
`workout_data_points`	Time-series data within workouts
`sleep_sessions`	Sleep tracking sessions
`sleep_data_points`	Time-series sleep stage data
`heart_rate_samples`	Heart rate measurements
`samples`	Generic time-series data

Background Processing

Table	Description
`background_tasks`	Celery task tracking
`sync_logs`	Provider synchronization history

The schema is designed to normalize data from multiple wearable providers into a consistent format using series_type_definitions.

Migration Management

Running Migrations

# Apply all pending migrations
make migrate

# Or using docker compose directly
docker compose exec app alembic upgrade head

Migration behavior:

Migrations run automatically on application startup
Safe to run multiple times (idempotent)
Applied in sequential order based on revision numbers

Creating Migrations

When you modify database models in app/models/, create a migration:

Make model changes

Edit files in backend/app/models/:

app/models/user.py

from app.database import BaseDbModel
from app.mappings import PrimaryKey, email, str_100

class User(BaseDbModel):
    id: Mapped[PrimaryKey[UUID]]
    email: Mapped[Unique[email]]
    # Add new field
    timezone: Mapped[str_100 | None] = mapped_column(default="UTC")

Generate migration

Create an autogenerated migration:

# Using Makefile
make create_migration m="Add timezone to users"

# Or directly
docker compose exec app alembic revision --autogenerate -m "Add timezone to users"

This creates a new file in backend/migrations/versions/.

Review migration

Always review the generated migration file:

migrations/versions/xxx_add_timezone_to_users.py

def upgrade() -> None:
    op.add_column('users', 
        sa.Column('timezone', sa.String(length=100), 
                 nullable=True, server_default='UTC'))

def downgrade() -> None:
    op.drop_column('users', 'timezone')

Alembic’s autogenerate is helpful but not perfect. Always review migrations before applying them, especially:

Index changes
Constraint modifications
Data migrations
Column type changes

Apply migration

make migrate

Migration Best Practices

Backward compatibility

Write migrations that can be rolled back:

def upgrade() -> None:
    # Add column as nullable first
    op.add_column('users', sa.Column('new_field', sa.String(100), nullable=True))
    
    # Populate data for existing rows
    op.execute("UPDATE users SET new_field = 'default' WHERE new_field IS NULL")
    
    # Make it non-nullable after data is populated
    op.alter_column('users', 'new_field', nullable=False)

def downgrade() -> None:
    op.drop_column('users', 'new_field')

Data migrations

For complex data transformations, separate schema and data changes:

def upgrade() -> None:
    # Schema change
    op.add_column('workouts', sa.Column('distance_meters', sa.Numeric(15, 5)))
    
    # Data migration
    connection = op.get_bind()
    connection.execute(
        "UPDATE workouts SET distance_meters = distance_km * 1000 "
        "WHERE distance_km IS NOT NULL"
    )
    
    # Cleanup old column
    op.drop_column('workouts', 'distance_km')

def downgrade() -> None:
    op.add_column('workouts', sa.Column('distance_km', sa.Numeric(10, 2)))
    connection = op.get_bind()
    connection.execute(
        "UPDATE workouts SET distance_km = distance_meters / 1000 "
        "WHERE distance_meters IS NOT NULL"
    )
    op.drop_column('workouts', 'distance_meters')

Large table migrations

For tables with millions of rows, use concurrent operations:

def upgrade() -> None:
    # Add index concurrently (doesn't lock table)
    op.create_index(
        'ix_workouts_user_id_date',
        'workouts',
        ['user_id', 'start_time'],
        postgresql_concurrently=True
    )

def downgrade() -> None:
    op.drop_index(
        'ix_workouts_user_id_date',
        'workouts',
        postgresql_concurrently=True
    )

Concurrent operations require a database connection outside a transaction. Set connection.execution_options(isolation_level="AUTOCOMMIT") if needed.

Rolling Back Migrations

# Rollback the last migration
docker compose exec app alembic downgrade -1

Rolling back migrations can result in data loss. Always backup the database before performing rollbacks in production.

Migration History

# View migration history
docker compose exec app alembic history

# View current migration version
docker compose exec app alembic current

# Show pending migrations
docker compose exec app alembic heads

Database Connection

Connection String Format

Open Wearables uses SQLAlchemy with the psycopg driver:

postgresql+psycopg://{user}:{password}@{host}:{port}/{database}

Example:

postgresql+psycopg://open-wearables:mypassword@localhost:5432/open-wearables

Connection Pooling

SQLAlchemy manages connection pooling automatically with these defaults:

Pool size: 5 connections
Max overflow: 10 additional connections
Pool timeout: 30 seconds
Pool recycle: 3600 seconds (1 hour)

These values work well for most deployments. Adjust if needed:

backend/app/database.py

engine = create_engine(
    settings.db_uri,
    pool_size=10,
    max_overflow=20,
    pool_timeout=30,
    pool_recycle=3600,
)

Testing Database Connection

# Test connection using psql
docker compose exec db psql -U open-wearables -d open-wearables -c "SELECT version();"

Seed Data

Initial Data

On first startup, the application automatically creates:

Admin account - From ADMIN_EMAIL and ADMIN_PASSWORD environment variables
Series type definitions - Standardized metric definitions for normalizing provider data

Sample Data (Development)

Load sample test data for development:

make seed

This creates:

Sample users
Example workouts with data points
Sleep sessions with sleep stages
Heart rate samples
Provider connections

The seed command is for development only. Never run it in production as it may create test data conflicts.

Custom Seed Scripts

Create custom seed scripts in backend/scripts/:

backend/scripts/seed_custom.py

from app.database import get_db
from app.models import User
from app.schemas import UserCreateInternal
from app.services import user_service

def seed_users():
    db = next(get_db())
    
    users = [
        UserCreateInternal(email="[email protected]", first_name="Alice"),
        UserCreateInternal(email="[email protected]", first_name="Bob"),
    ]
    
    for user_data in users:
        user_service.create(db, user_data)
        print(f"Created user: {user_data.email}")

if __name__ == "__main__":
    seed_users()

Run your script:

docker compose exec app python scripts/seed_custom.py

Backup and Restore

Creating Backups

# Backup entire database
docker compose exec -T db pg_dump -U open-wearables -d open-wearables > backup_$(date +%Y%m%d_%H%M%S).sql

Restoring Backups

Restoring a backup will overwrite all existing data. Always backup current data before restoring.

Stop application services

docker compose stop app celery-worker celery-beat

Drop and recreate database

docker compose exec db psql -U open-wearables -d postgres -c "DROP DATABASE IF EXISTS \"open-wearables\";"
docker compose exec db psql -U open-wearables -d postgres -c "CREATE DATABASE \"open-wearables\";"

Restore backup

cat backup.sql | docker compose exec -T db psql -U open-wearables -d open-wearables

Restart services

docker compose start app celery-worker celery-beat

Automated Backup Strategy

For production, implement automated backups:

Cron-based backup script

backup.sh

#!/bin/bash

BACKUP_DIR="/backups/postgres"
RETENTION_DAYS=30

# Create backup directory
mkdir -p $BACKUP_DIR

# Create backup
BACKUP_FILE="$BACKUP_DIR/open_wearables_$(date +%Y%m%d_%H%M%S).sql.gz"
docker compose exec -T db pg_dump -U open-wearables -d open-wearables | gzip > $BACKUP_FILE

# Upload to S3 (optional)
aws s3 cp $BACKUP_FILE s3://my-backups/postgres/

# Delete old backups
find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup completed: $BACKUP_FILE"

Add to crontab:

# Daily backup at 2 AM
0 2 * * * /path/to/backup.sh

Managed database backups

When using managed PostgreSQL services (AWS RDS, Azure Database, etc.):

Enable automated daily backups
Set retention period (7-35 days recommended)
Configure backup window during low-traffic hours
Enable point-in-time recovery (PITR)
Test restore procedures regularly

AWS RDS Example:

Automated backups: Enabled
Backup retention: 30 days
Backup window: 02:00-04:00 UTC
PITR: Enabled (allows restore to any second in retention period)

Performance Optimization

Indexes

The schema includes indexes on frequently queried columns:

-- User lookups
CREATE UNIQUE INDEX ix_users_email ON users(email);

-- Workout queries
CREATE INDEX ix_workouts_user_id ON workouts(user_id);
CREATE INDEX ix_workouts_start_time ON workouts(start_time);
CREATE INDEX ix_workouts_provider_id ON workouts(provider_id);

-- Time-series data
CREATE INDEX ix_workout_data_points_workout_id ON workout_data_points(workout_id);
CREATE INDEX ix_sleep_data_points_session_id ON sleep_data_points(sleep_session_id);

Indexes speed up queries but slow down inserts/updates. Monitor query performance and add indexes based on actual usage patterns.

Monitoring Query Performance

-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1 second
SELECT pg_reload_conf();

Vacuuming and Maintenance

PostgreSQL requires periodic maintenance:

-- Manual vacuum (usually automatic)
VACUUM ANALYZE;

-- Check table bloat
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

PostgreSQL’s autovacuum runs automatically but can be tuned:

-- Increase autovacuum frequency for high-traffic tables
ALTER TABLE workouts SET (autovacuum_vacuum_scale_factor = 0.05);
ALTER TABLE workout_data_points SET (autovacuum_vacuum_scale_factor = 0.05);

Production Database

Managed Database Services

For production, use managed PostgreSQL services:

AWS RDS

Recommended configuration:

Instance: db.t4g.medium or larger
Storage: General Purpose SSD (gp3) with auto-scaling
Multi-AZ: Enabled for high availability
Encryption: Enabled at rest and in transit
Automated backups: 30-day retention
Monitoring: Enhanced monitoring enabled

Connection:

.env

DB_HOST=open-wearables.abcd1234.us-east-1.rds.amazonaws.com
DB_PORT=5432
DB_NAME=open_wearables_prod
DB_USER=ow_admin
DB_PASSWORD=your-secure-password

Azure Database for PostgreSQL

Recommended configuration:

Tier: General Purpose
Compute: 2-4 vCores
Storage: 100GB with auto-growth
Backup retention: 30 days
High availability: Zone-redundant
SSL enforcement: Enabled

Connection:

.env

DB_HOST=open-wearables.postgres.database.azure.com
DB_PORT=5432
DB_NAME=open_wearables_prod
DB_USER=ow_admin@open-wearables
DB_PASSWORD=your-secure-password

Google Cloud SQL

Recommended configuration:

Machine type: db-custom-2-7680 (2 vCPU, 7.5GB RAM)
Storage: 100GB SSD with auto-increase
Availability: Regional (high availability)
Backups: Automated daily with 30-day retention
Encryption: Enabled

Connection:

.env

DB_HOST=10.1.2.3  # Private IP
DB_PORT=5432
DB_NAME=open_wearables_prod
DB_USER=ow_admin
DB_PASSWORD=your-secure-password

Security Best Practices

Production Database Security Checklist:

Network isolation - Use VPC/private network, no public access
Strong passwords - Use generated passwords with high entropy
Encryption - Enable at-rest and in-transit encryption
SSL/TLS - Require SSL connections
Least privilege - Application user should not have superuser access
Regular updates - Apply security patches promptly
Audit logging - Enable and monitor database logs
Backup encryption - Encrypt backup files
Access control - Use IAM authentication when available
Monitoring - Set up alerts for suspicious activity

Troubleshooting

Connection refused errors

Symptoms: psycopg.OperationalError: connection refusedSolutions:

Verify database container is running:
```
docker compose ps db
```
Check database logs:
```
docker compose logs db
```

Verify connection settings in .env:

docker compose exec app python -c "from app.config import settings; print(settings.db_uri)"

Test connection manually:

docker compose exec db psql -U open-wearables -d open-wearables

Migration errors

Symptoms: alembic.util.exc.CommandError or migration failuresSolutions:

Check current migration version:

docker compose exec app alembic current

View migration history:

docker compose exec app alembic history

If migrations are out of sync, stamp the database:

# Stamp database to specific revision (use with caution)
docker compose exec app alembic stamp head

For conflicts, manually resolve or rollback:

docker compose exec app alembic downgrade -1
docker compose exec app alembic upgrade head

Slow queries

Symptoms: API responses are slow, database CPU is highDiagnosis:

Enable slow query logging:

ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();

View running queries:

SELECT pid, now() - query_start as duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

Check for missing indexes:

SELECT schemaname, tablename, attname
FROM pg_stats
WHERE schemaname = 'public'
AND n_distinct > 100
AND correlation < 0.1;

Solutions:

Add indexes on frequently filtered columns
Optimize queries to reduce data scanned
Use pagination for large result sets
Consider database scaling if consistently slow

Disk space issues

Symptoms: No space left on device errorsSolutions:

Check volume usage:
```
docker system df -v
```

Check database size:

SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname))
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

Clean up old data:

-- Example: Delete old sync logs
DELETE FROM sync_logs WHERE created_at < NOW() - INTERVAL '90 days';
VACUUM FULL sync_logs;

Increase volume size or enable auto-scaling in managed services

Next Steps

Docker Deployment

Complete deployment guide with Docker Compose

Environment Variables

Configure all environment settings

Getting Started

Core Features

Guides

Deployment

Development

​Overview

​Database Requirements

​Quick Setup

​Database Schema

​Core Tables

​Data Tables

​Background Processing

​Migration Management

​Running Migrations

​Creating Migrations

​Migration Best Practices

​Rolling Back Migrations

​Migration History

​Database Connection

​Connection String Format

​Connection Pooling

​Testing Database Connection

​Seed Data

​Initial Data

​Sample Data (Development)

​Custom Seed Scripts

​Backup and Restore

​Creating Backups

​Restoring Backups

​Automated Backup Strategy

​Performance Optimization

​Indexes

​Monitoring Query Performance

​Vacuuming and Maintenance

​Production Database

​Managed Database Services

​Security Best Practices

​Troubleshooting

​Next Steps

Docker Deployment

Environment Variables

Build docs developers (and LLMs) love

Overview

Database Requirements

Quick Setup

Database Schema

Core Tables

Data Tables

Background Processing

Migration Management

Running Migrations

Creating Migrations

Migration Best Practices

Rolling Back Migrations

Migration History

Database Connection

Connection String Format

Connection Pooling

Testing Database Connection

Seed Data

Initial Data

Sample Data (Development)

Custom Seed Scripts

Backup and Restore

Creating Backups

Restoring Backups

Automated Backup Strategy

Performance Optimization

Indexes

Monitoring Query Performance

Vacuuming and Maintenance

Production Database

Managed Database Services

Security Best Practices

Troubleshooting

Next Steps