Skip to main content

Prerequisites

Before you begin, ensure you have the following installed:

Docker

Docker Engine 20.10 or higher for containerized PostgreSQL

PostgreSQL Client

psql or any PostgreSQL client for database interaction

Setup Instructions

Follow these steps to set up your SQL Data Warehouse with Bronze, Silver, and Gold layers:
1

Clone the Repository

Clone the project repository to your local machine:
git clone https://github.com/LeoBellier/sql-data-warehouse-project.git
cd sql-data-warehouse-project
2

Configure Environment Variables

Create a .env file in the root directory with your PostgreSQL credentials:
.env
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_DB=datawarehouse
These credentials will be used by the PostgreSQL container defined in docker-compose.yml
3

Start PostgreSQL Container

Launch the PostgreSQL database using Docker Compose:
docker-compose up -d
This will start a PostgreSQL 16 container named datawarehouse on port 5432 with the datasets mounted as read-only volumes.
Verify the container is running: docker ps
4

Create Database Schemas

Connect to the database and create the Bronze, Silver, and Gold schemas:
psql -h localhost -U your_username -d datawarehouse -f scripts/SchemaCreator.sql
This creates three schemas following the Medallion Architecture:
  • Bronze: Raw data layer
  • Silver: Cleaned and standardized data
  • Gold: Business-ready analytical models
5

Create Bronze Layer Tables

Set up the Bronze layer tables for raw data ingestion:
psql -h localhost -U your_username -d datawarehouse -f scripts/bronze/ddl_bronze.sql
This creates tables for:
  • CRM data: crm_cust_info, crm_prd_info, crm_sales_details
  • ERP data: erp_cust_az12, erp_loc_a101, erp_px_cat_g1v2
6

Create Silver Layer Tables

Set up the Silver layer tables for cleaned data:
psql -h localhost -U your_username -d datawarehouse -f scripts/silver/ddl_silver.sql
These tables mirror the Bronze layer structure but include data quality improvements and tracking timestamps.
7

Create Gold Layer Views

Set up the Gold layer analytical views:
psql -h localhost -U your_username -d datawarehouse -f scripts/gold/ddl_gold.sql
This creates star schema components:
  • dim_customer: Customer dimension
  • dim_product: Product dimension
  • fact_sales: Sales fact table

Running the ETL Pipeline

Once the setup is complete, execute the ETL pipeline to populate your data warehouse:
Load raw data from CSV files into Bronze layer tables:
CALL bronze.load_bronze();
This procedure:
  • Loads CRM customer, product, and sales data
  • Imports ERP customer demographics, location, and product category data
  • Truncates existing data before loading
  • Provides execution time metrics for each table
Transform and clean data from Bronze to Silver layer:
CALL silver.load_silver();
This procedure applies data quality rules and standardization to prepare data for analytics.

Verifying the Setup

Use these queries to verify your data warehouse is set up correctly:
-- Check Bronze layer tables
SELECT 'crm_cust_info' as table_name, COUNT(*) as row_count 
FROM bronze.crm_cust_info
UNION ALL
SELECT 'crm_prd_info', COUNT(*) FROM bronze.crm_prd_info
UNION ALL
SELECT 'crm_sales_details', COUNT(*) FROM bronze.crm_sales_details
UNION ALL
SELECT 'erp_cust_az12', COUNT(*) FROM bronze.erp_cust_az12
UNION ALL
SELECT 'erp_loc_a101', COUNT(*) FROM bronze.erp_loc_a101
UNION ALL
SELECT 'erp_px_cat_g1v2', COUNT(*) FROM bronze.erp_px_cat_g1v2;

Next Steps

Now that your data warehouse is set up, explore these topics:

Architecture Overview

Learn about the Medallion Architecture and data flow

Bronze Layer Tables

Explore the raw data schema in the Bronze layer

ETL Procedures

Deep dive into ETL procedures and transformations

Gold Layer Analytics

Understand the star schema for analytics and reporting

Troubleshooting

Issue: Docker container fails to start or exits immediatelySolutions:
  • Ensure port 5432 is not already in use: lsof -i :5432
  • Check Docker logs: docker logs datawarehouse
  • Verify your .env file has all required variables
  • Ensure Docker has sufficient resources allocated
Issue: Connection refused when trying to connect via psqlSolutions:
  • Verify container is running: docker ps
  • Check PostgreSQL is listening: docker exec datawarehouse pg_isready
  • Confirm credentials in .env match connection string
  • Wait a few seconds after container starts for PostgreSQL to initialize
Issue: CALL bronze.load_bronze() or CALL silver.load_silver() failsSolutions:
  • Check that CSV files exist in the datasets/ directory
  • Verify the datasets volume is mounted: docker inspect datawarehouse
  • Review error messages in the procedure output
  • Ensure all schema and table DDL scripts ran successfully
  • Check file permissions on the datasets directory
Issue: Gold layer views return no data or incomplete dataSolutions:
  • Ensure Bronze and Silver layers are populated first
  • Verify the ETL procedures completed successfully
  • Check for NULL values in join keys between layers
  • Review the view definitions in scripts/gold/ddl_gold.sql
  • Run the verification queries for each layer

Build docs developers (and LLMs) love