Skip to main content

Welcome to the SQL Data Warehouse

This project demonstrates a comprehensive data warehousing and analytics solution built with PostgreSQL. It showcases industry best practices in data engineering, from raw data ingestion to business-ready analytics using the Medallion Architecture pattern.
This project consolidates sales data from multiple source systems (ERP and CRM) into a unified, analytical data model optimized for business intelligence and reporting.

Medallion Architecture Overview

The data warehouse follows the Medallion Architecture with three distinct layers:

Bronze Layer

Raw Data StorageIngests raw data as-is from source systems (CSV files) into PostgreSQL tables. No transformations applied - preserving data lineage and enabling reprocessing.

Silver Layer

Cleaned & StandardizedApplies data cleansing, standardization, and normalization. Handles data quality issues, deduplication, and business rule transformations.

Gold Layer

Analytics-ReadyStar schema dimensional model with fact and dimension tables optimized for analytical queries and BI reporting.

Key Features

Three-layer architecture (Bronze-Silver-Gold) ensuring data quality progression from raw ingestion to analytics-ready models.
Stored procedures automate the extract, transform, and load process:
  • bronze.load_bronze() - Loads raw CSV data from source systems
  • silver.load_silver() - Cleanses and standardizes data with quality checks
Optimized dimensional model with:
  • Fact Table: fact_sales - Sales transactions with metrics
  • Dimensions: dim_customer, dim_product - Business entities for slicing and dicing
Built-in data quality checks including:
  • Deduplication logic
  • Date format validation and correction
  • Null handling and default values
  • Cross-field validation (e.g., sales = quantity × price)
Integrates data from two source systems:
  • CRM System: Customer info, product info, sales details
  • ERP System: Customer demographics, location data, product categories
Containerized PostgreSQL 16 deployment with:
  • Docker Compose configuration for easy setup
  • Persistent volume storage
  • Pre-mounted dataset directory

Project Objectives

Data Engineering Focus

1

Consolidate Multi-Source Data

Import and integrate data from ERP and CRM systems provided as CSV files into a unified PostgreSQL data warehouse.
2

Ensure Data Quality

Cleanse and resolve data quality issues including duplicates, invalid dates, inconsistent formats, and missing values before analysis.
3

Build Analytical Model

Combine sources into a user-friendly star schema data model optimized for analytical queries and business intelligence.
4

Enable Analytics & Reporting

Provide a solid foundation for SQL-based analytics to deliver insights into customer behavior, product performance, and sales trends.

Get Started

Quickstart

Get your data warehouse up and running in minutes with Docker

Architecture

Deep dive into the Medallion Architecture and data flow

Schema Reference

Explore the complete data model and table structures

Repository Structure

data-warehouse-project/

├── datasets/              # Raw CSV data (ERP and CRM sources)

├── docs/                  # Architecture diagrams and documentation
│   ├── DataWareHouse.png  # Project architecture overview
│   ├── data_flow.png      # Data flow visualization
│   └── data_model.drawio.png  # Data model diagram

├── scripts/               # SQL scripts for ETL and transformations
│   ├── bronze/           # Raw data extraction and loading
│   ├── silver/           # Data cleaning and transformation
│   └── gold/             # Analytical model creation (star schema)

├── tests/                # Data quality validation scripts

└── docker-compose.yml    # PostgreSQL container configuration
This project focuses on the latest dataset only - data historization is not implemented in this version.

Build docs developers (and LLMs) love