Skip to main content
This project uses Docker to containerize the PostgreSQL database, making it easy to set up, run, and tear down the data warehouse environment.

Docker Compose Overview

Docker Compose allows you to define and run the PostgreSQL container with all necessary configurations in a single YAML file.

Configuration File

The docker-compose.yml file defines the complete Docker setup:
docker-compose.yml
services:
  db:
    image: postgres:16-alpine
    container_name: datawarehouse
    restart: unless-stopped

    environment:
      POSTGRES_USER:     ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB:       ${POSTGRES_DB}
      PGDATA:            /var/lib/postgresql/data/pgdata

    ports:
      - "5432:5432"

    volumes:
      - pg_data:/var/lib/postgresql/data
      - ./datasets:/datasets:ro
volumes:
  pg_data:

Configuration Breakdown

Service Name: dbThe main database service with the following settings:
  • Image: postgres:16-alpine - Uses PostgreSQL 16 on Alpine Linux for a lightweight container
  • Container Name: datawarehouse - Fixed name for easy reference
  • Restart Policy: unless-stopped - Automatically restarts unless manually stopped
The container uses environment variables from your .env file:
VariableDescription
POSTGRES_USERPostgreSQL superuser name
POSTGRES_PASSWORDPassword for the superuser
POSTGRES_DBDefault database name
PGDATAData directory path inside container
These variables are loaded from the .env file in your project root directory.
Port Configuration: 5432:5432
  • Host Port: 5432 - The port on your local machine
  • Container Port: 5432 - PostgreSQL’s default port inside the container
This allows you to connect to the database using:
Host: localhost
Port: 5432
Database: datawarehouse
Username: warehouse_admin
Password: <your_password>
If you have another PostgreSQL instance running on port 5432, change the host port to avoid conflicts (e.g., 5433:5432).
Two volumes are configured:

1. Persistent Data Volume

- pg_data:/var/lib/postgresql/data
  • Named Volume: pg_data (defined in the volumes: section)
  • Purpose: Persists database data even when the container is removed
  • Location: Managed by Docker

2. Datasets Volume

- ./datasets:/datasets:ro
  • Bind Mount: Maps local ./datasets directory to /datasets in container
  • Mode: :ro (read-only) - Prevents accidental modification of source data
  • Purpose: Makes CSV files accessible to PostgreSQL for data import
The read-only mount ensures your source CSV files remain unchanged during ETL operations.

Container Management

Starting the Container

docker compose up -d
Use -d (detached mode) to run containers in the background, or omit it to see real-time logs.

Stopping the Container

docker compose stop
Caution: docker compose down -v will delete all database data. Only use this when you want to start completely fresh.

Viewing Container Status

# List running containers
docker ps

# View container logs
docker logs datawarehouse

# Follow logs in real-time
docker logs -f datawarehouse

# View container resource usage
docker stats datawarehouse

Database Access

Direct PostgreSQL Access

Connect to the PostgreSQL prompt inside the container:
docker exec -it datawarehouse psql -U warehouse_admin -d datawarehouse
Replace warehouse_admin with your POSTGRES_USER value from the .env file.

Running SQL Scripts

Execute SQL scripts from the host machine:
# Run a single script
docker exec -i datawarehouse psql -U warehouse_admin -d datawarehouse < scripts/bronze/create_tables.sql

# Run a script from inside the container
docker exec -it datawarehouse psql -U warehouse_admin -d datawarehouse -f /scripts/bronze/load_data.sql

External Database Connections

Connect using your favorite database client:

pgAdmin

  • Host: localhost
  • Port: 5432
  • Database: datawarehouse
  • Username: From .env
  • Password: From .env

DBeaver

  • Connection Type: PostgreSQL
  • Server: localhost
  • Port: 5432
  • Database: datawarehouse
  • Authentication: Database Native

Volume Management

Inspecting Volumes

# List all volumes
docker volume ls

# Inspect the pg_data volume
docker volume inspect <project>_pg_data

# View volume usage
docker system df -v

Backup and Restore

Create a backup of your database:
docker exec datawarehouse pg_dump -U warehouse_admin datawarehouse > backup.sql
Or create a compressed backup:
docker exec datawarehouse pg_dump -U warehouse_admin datawarehouse | gzip > backup.sql.gz

Troubleshooting

Check the logs:
docker logs datawarehouse
Common causes:
  • Missing or invalid .env file
  • Port 5432 already in use
  • Insufficient Docker resources
  • Corrupted volume data
Verify the container is running:
docker ps | grep datawarehouse
Test the connection:
docker exec -it datawarehouse pg_isready -U warehouse_admin
Check network connectivity:
nc -zv localhost 5432
Verify the mount:
docker exec -it datawarehouse ls -la /datasets
Check local directory permissions:
ls -la datasets/
chmod -R 755 datasets/
Clean up unused resources:
# Remove unused containers, networks, and images
docker system prune

# Remove unused volumes (caution!)
docker volume prune

# View disk usage
docker system df

Advanced Configuration

Custom PostgreSQL Configuration

To use a custom PostgreSQL configuration file:
  1. Create a postgresql.conf file in your project root
  2. Add a volume mount in docker-compose.yml:
volumes:
  - ./postgresql.conf:/etc/postgresql/postgresql.conf:ro
  1. Update the command to use the custom config:
command: postgres -c config_file=/etc/postgresql/postgresql.conf

Performance Tuning

Optimize PostgreSQL for data warehousing by adding environment variables:
environment:
  POSTGRES_INITDB_ARGS: "-E UTF8 --locale=C"
  POSTGRES_HOST_AUTH_METHOD: trust  # Development only!
Never use POSTGRES_HOST_AUTH_METHOD: trust in production environments.

Next Steps

ETL Pipeline

Learn how to extract, transform, and load data

Data Model

Explore the star schema and analytics model

Bronze Layer

Dive into the Bronze, Silver, and Gold layer pipeline

Data Quality Testing

Set up data quality tests and validation

Build docs developers (and LLMs) love