Welcome to the SQL Data Warehouse
This project demonstrates a comprehensive data warehousing and analytics solution built with PostgreSQL. It showcases industry best practices in data engineering, from raw data ingestion to business-ready analytics using the Medallion Architecture pattern.This project consolidates sales data from multiple source systems (ERP and CRM) into a unified, analytical data model optimized for business intelligence and reporting.
Medallion Architecture Overview
The data warehouse follows the Medallion Architecture with three distinct layers:Bronze Layer
Raw Data StorageIngests raw data as-is from source systems (CSV files) into PostgreSQL tables. No transformations applied - preserving data lineage and enabling reprocessing.
Silver Layer
Cleaned & StandardizedApplies data cleansing, standardization, and normalization. Handles data quality issues, deduplication, and business rule transformations.
Gold Layer
Analytics-ReadyStar schema dimensional model with fact and dimension tables optimized for analytical queries and BI reporting.
Key Features
Medallion Architecture
Medallion Architecture
Three-layer architecture (Bronze-Silver-Gold) ensuring data quality progression from raw ingestion to analytics-ready models.
Automated ETL Pipelines
Automated ETL Pipelines
Stored procedures automate the extract, transform, and load process:
bronze.load_bronze()- Loads raw CSV data from source systemssilver.load_silver()- Cleanses and standardizes data with quality checks
Star Schema Design
Star Schema Design
Optimized dimensional model with:
- Fact Table:
fact_sales- Sales transactions with metrics - Dimensions:
dim_customer,dim_product- Business entities for slicing and dicing
Data Quality Validation
Data Quality Validation
Built-in data quality checks including:
- Deduplication logic
- Date format validation and correction
- Null handling and default values
- Cross-field validation (e.g., sales = quantity × price)
Multi-Source Integration
Multi-Source Integration
Integrates data from two source systems:
- CRM System: Customer info, product info, sales details
- ERP System: Customer demographics, location data, product categories
Docker-Based PostgreSQL
Docker-Based PostgreSQL
Containerized PostgreSQL 16 deployment with:
- Docker Compose configuration for easy setup
- Persistent volume storage
- Pre-mounted dataset directory
Project Objectives
Data Engineering Focus
Consolidate Multi-Source Data
Import and integrate data from ERP and CRM systems provided as CSV files into a unified PostgreSQL data warehouse.
Ensure Data Quality
Cleanse and resolve data quality issues including duplicates, invalid dates, inconsistent formats, and missing values before analysis.
Build Analytical Model
Combine sources into a user-friendly star schema data model optimized for analytical queries and business intelligence.
Get Started
Quickstart
Get your data warehouse up and running in minutes with Docker
Architecture
Deep dive into the Medallion Architecture and data flow
Schema Reference
Explore the complete data model and table structures