Prerequisites
Before you begin, ensure you have the following installed:Docker
Docker Engine 20.10 or higher for containerized PostgreSQL
PostgreSQL Client
psql or any PostgreSQL client for database interaction
Setup Instructions
Follow these steps to set up your SQL Data Warehouse with Bronze, Silver, and Gold layers:Configure Environment Variables
Create a
.env file in the root directory with your PostgreSQL credentials:.env
These credentials will be used by the PostgreSQL container defined in
docker-compose.ymlStart PostgreSQL Container
Launch the PostgreSQL database using Docker Compose:This will start a PostgreSQL 16 container named
datawarehouse on port 5432 with the datasets mounted as read-only volumes.Verify the container is running:
docker psCreate Database Schemas
Connect to the database and create the Bronze, Silver, and Gold schemas:This creates three schemas following the Medallion Architecture:
- Bronze: Raw data layer
- Silver: Cleaned and standardized data
- Gold: Business-ready analytical models
Create Bronze Layer Tables
Set up the Bronze layer tables for raw data ingestion:This creates tables for:
- CRM data:
crm_cust_info,crm_prd_info,crm_sales_details - ERP data:
erp_cust_az12,erp_loc_a101,erp_px_cat_g1v2
Create Silver Layer Tables
Set up the Silver layer tables for cleaned data:These tables mirror the Bronze layer structure but include data quality improvements and tracking timestamps.
Running the ETL Pipeline
Once the setup is complete, execute the ETL pipeline to populate your data warehouse:Load Bronze Layer
Load Bronze Layer
Load raw data from CSV files into Bronze layer tables:This procedure:
- Loads CRM customer, product, and sales data
- Imports ERP customer demographics, location, and product category data
- Truncates existing data before loading
- Provides execution time metrics for each table
Load Silver Layer
Load Silver Layer
Transform and clean data from Bronze to Silver layer:This procedure applies data quality rules and standardization to prepare data for analytics.
Verifying the Setup
Use these queries to verify your data warehouse is set up correctly:Next Steps
Now that your data warehouse is set up, explore these topics:Architecture Overview
Learn about the Medallion Architecture and data flow
Bronze Layer Tables
Explore the raw data schema in the Bronze layer
ETL Procedures
Deep dive into ETL procedures and transformations
Gold Layer Analytics
Understand the star schema for analytics and reporting
Troubleshooting
Container won't start
Container won't start
Issue: Docker container fails to start or exits immediatelySolutions:
- Ensure port 5432 is not already in use:
lsof -i :5432 - Check Docker logs:
docker logs datawarehouse - Verify your
.envfile has all required variables - Ensure Docker has sufficient resources allocated
Cannot connect to database
Cannot connect to database
Issue: Connection refused when trying to connect via psqlSolutions:
- Verify container is running:
docker ps - Check PostgreSQL is listening:
docker exec datawarehouse pg_isready - Confirm credentials in
.envmatch connection string - Wait a few seconds after container starts for PostgreSQL to initialize
ETL procedure fails
ETL procedure fails
Issue:
CALL bronze.load_bronze() or CALL silver.load_silver() failsSolutions:- Check that CSV files exist in the
datasets/directory - Verify the datasets volume is mounted:
docker inspect datawarehouse - Review error messages in the procedure output
- Ensure all schema and table DDL scripts ran successfully
- Check file permissions on the datasets directory
Missing data in Gold layer
Missing data in Gold layer
Issue: Gold layer views return no data or incomplete dataSolutions:
- Ensure Bronze and Silver layers are populated first
- Verify the ETL procedures completed successfully
- Check for NULL values in join keys between layers
- Review the view definitions in
scripts/gold/ddl_gold.sql - Run the verification queries for each layer