Docker Compose Overview
Docker Compose allows you to define and run the PostgreSQL container with all necessary configurations in a single YAML file.
Configuration File
Thedocker-compose.yml file defines the complete Docker setup:
docker-compose.yml
Configuration Breakdown
Service Configuration
Service Configuration
Service Name:
dbThe main database service with the following settings:- Image:
postgres:16-alpine- Uses PostgreSQL 16 on Alpine Linux for a lightweight container - Container Name:
datawarehouse- Fixed name for easy reference - Restart Policy:
unless-stopped- Automatically restarts unless manually stopped
Environment Variables
Environment Variables
The container uses environment variables from your
.env file:| Variable | Description |
|---|---|
POSTGRES_USER | PostgreSQL superuser name |
POSTGRES_PASSWORD | Password for the superuser |
POSTGRES_DB | Default database name |
PGDATA | Data directory path inside container |
Port Mappings
Port Mappings
Port Configuration:
5432:5432- Host Port:
5432- The port on your local machine - Container Port:
5432- PostgreSQL’s default port inside the container
Volume Mounts
Volume Mounts
Two volumes are configured:
1. Persistent Data Volume
- Named Volume:
pg_data(defined in thevolumes:section) - Purpose: Persists database data even when the container is removed
- Location: Managed by Docker
2. Datasets Volume
- Bind Mount: Maps local
./datasetsdirectory to/datasetsin container - Mode:
:ro(read-only) - Prevents accidental modification of source data - Purpose: Makes CSV files accessible to PostgreSQL for data import
The read-only mount ensures your source CSV files remain unchanged during ETL operations.
Container Management
Starting the Container
Stopping the Container
Viewing Container Status
Database Access
Direct PostgreSQL Access
Connect to the PostgreSQL prompt inside the container:Replace
warehouse_admin with your POSTGRES_USER value from the .env file.Running SQL Scripts
Execute SQL scripts from the host machine:External Database Connections
Connect using your favorite database client:pgAdmin
- Host:
localhost - Port:
5432 - Database:
datawarehouse - Username: From
.env - Password: From
.env
DBeaver
- Connection Type: PostgreSQL
- Server:
localhost - Port:
5432 - Database:
datawarehouse - Authentication: Database Native
Volume Management
Inspecting Volumes
Backup and Restore
- Backup
- Restore
Create a backup of your database:Or create a compressed backup:
Troubleshooting
Container won't start
Container won't start
Check the logs:Common causes:
- Missing or invalid
.envfile - Port 5432 already in use
- Insufficient Docker resources
- Corrupted volume data
Cannot connect to database
Cannot connect to database
Verify the container is running:Test the connection:Check network connectivity:
Datasets not accessible
Datasets not accessible
Verify the mount:Check local directory permissions:
Out of disk space
Out of disk space
Clean up unused resources:
Advanced Configuration
Custom PostgreSQL Configuration
To use a custom PostgreSQL configuration file:- Create a
postgresql.conffile in your project root - Add a volume mount in
docker-compose.yml:
- Update the command to use the custom config:
Performance Tuning
Optimize PostgreSQL for data warehousing by adding environment variables:Next Steps
ETL Pipeline
Learn how to extract, transform, and load data
Data Model
Explore the star schema and analytics model
Bronze Layer
Dive into the Bronze, Silver, and Gold layer pipeline
Data Quality Testing
Set up data quality tests and validation