Medallion Architecture

What is Medallion Architecture?

The Medallion Architecture is a data design pattern that organizes data into three progressive layers - Bronze, Silver, and Gold. Each layer serves a specific purpose in the data refinement process, transforming raw source data into analytics-ready datasets.

The medallion pattern provides clear separation of concerns, making data lineage transparent and enabling teams to work at different layers simultaneously.

Architecture Layers

The three layers work together to progressively refine data quality and structure:

Bronze Layer
Silver Layer
Gold Layer

Bronze Layer: Raw Data Ingestion

The Bronze layer serves as the landing zone for raw data from source systems. Data is ingested with minimal transformation to preserve the original state.

Characteristics

Raw Data Storage

Stores data exactly as received from ERP and CRM systems, maintaining complete fidelity to source systems.

Single Source of Truth

Serves as the authoritative record of what was received from source systems, enabling data lineage tracking.

Minimal Transformation

Only basic type conversions and schema validation are performed during ingestion from CSV files.

Data Preservation

Maintains historical snapshots of source data for audit, debugging, and reprocessing scenarios.

Ingestion Process

CSV File Reception

Source systems (ERP and CRM) export data as CSV files to a designated ingestion location.

Schema Validation

Validate that incoming CSV files match expected schema definitions before loading.

Data Loading

Load CSV data into PostgreSQL Bronze tables using COPY commands or ETL scripts.

Metadata Capture

Record ingestion timestamp, source file name, and row counts for tracking and auditing.

Data Sources

ERP System Data

Customer demographic data, product categories, and location information:

Customer profiles with birth dates and gender
Product categorization and maintenance flags
Geographic location mappings

CRM System Data

Customer interactions, product details, and sales transactions:

Customer contact information and preferences
Product catalog with pricing and lifecycle dates
Sales order details with quantities and amounts

Bronze layer data may contain quality issues, duplicates, and inconsistencies inherited from source systems. Never query Bronze tables directly for analytics.

Silver Layer: Cleansing & Standardization

The Silver layer transforms raw Bronze data into cleaned, standardized, and validated datasets ready for business modeling.

Transformation Activities

Data Cleansing

Remove or correct data quality issues:

Handle null values and missing data appropriately
Correct invalid or out-of-range values
Standardize text fields (trim whitespace, fix casing)
Remove or flag corrupted records

Standardization

Apply consistent formats and conventions:

Standardize date and timestamp formats
Normalize naming conventions across sources
Apply standard code mappings and lookups
Convert to canonical data types

Normalization

Structure data for consistency:

Remove redundant data and duplicates
Split combined fields into atomic elements
Apply referential integrity rules
Ensure consistent granularity

Data Validation

Verify data quality and business rules:

Check referential integrity across tables
Validate business rule constraints
Flag anomalies for investigation
Generate data quality metrics

Silver Tables

The Silver layer contains cleaned versions of Bronze tables:

silver.crm_cust_info - Standardized customer information from CRM
silver.crm_prd_info - Cleansed product catalog from CRM
silver.crm_sales_details - Validated sales transactions
silver.erp_cust_az12 - Customer demographics from ERP
silver.erp_px_cat_g1v2 - Product categorization from ERP
silver.erp_loc_a101 - Location data from ERP

Silver tables use consistent naming conventions and data types, making them easier to join and query compared to Bronze tables.

Quality Improvements

Examples of quality enhancements in the Silver layer:

-- Handle missing gender by falling back to ERP source
CASE 
  WHEN crm.cst_gndr != 'n/a' THEN crm.cst_gndr 
  ELSE COALESCE(erp.gen, 'n/a') 
END AS gender

Gold Layer: Business-Ready Analytics

The Gold layer contains curated, business-focused datasets modeled for analytical queries and reporting. Data is organized in a star schema with fact and dimension tables.

Purpose and Benefits

Optimized for Analytics

Star schema design enables fast, intuitive queries for common business questions.

Business Terminology

Uses business-friendly names and structures that analysts and stakeholders understand.

Pre-Joined Data

Dimension and fact tables are pre-modeled to simplify query complexity.

Performance Optimized

Denormalized structure reduces join complexity and improves query performance.

Star Schema Components

Fact Tables

Central tables containing business metrics and foreign keys to dimensions:fact_sales: Sales transactions with measures and dimension references

Measures: sales_amount, quantity, price
Dimensions: customer_key, product_key, order_date, shipping_date

Fact tables answer questions like “What were total sales?” and “How many units sold?”

Dimension Tables

Descriptive attributes that provide context for analyzing facts:dim_customer: Customer attributes for segmentation and analysis

Demographics: gender, marital_status, birth_date
Identifiers: customer_number, first_name, last_name

dim_product: Product attributes for performance analysis

Product details: product_name, category, subcategory
Attributes: product_cost, product_line, maintenance flag

Dimensions answer “Who?”, “What?”, and “Where?” questions about business events.

Analytical Capabilities

The Gold layer enables powerful analytical queries:

Customer Analytics

Analyze customer behavior, segmentation, and lifetime value using dim_customer attributes combined with sales facts.

Product Performance

Track product sales, profitability, and category performance using dim_product joined with fact_sales.

Sales Trends

Examine sales patterns over time, seasonal trends, and growth metrics using date dimensions and sales aggregations.

Cross-Dimensional Analysis

Combine multiple dimensions to answer complex questions like “Which customer segments buy which product categories?”

The Gold layer is the primary interface for BI tools, SQL analysts, and reporting applications. All analytics should query Gold tables.

Layer Comparison

Understanding the differences between layers helps determine where to focus efforts:

Aspect	Bronze	Silver	Gold
Purpose	Raw data preservation	Data cleansing & prep	Analytics & reporting
Data Quality	As-is from source	Validated & cleaned	Business-ready
Structure	Source system schemas	Normalized tables	Star schema (denormalized)
Users	Data engineers	Data engineers	Analysts, BI tools
Queryable	For debugging only	Yes, for ETL	Primary analytics interface
Naming	Source system names	Standardized technical	Business-friendly

Data Flow Between Layers

Data progresses through the layers in a controlled pipeline:

Each layer transformation is implemented as SQL scripts and views in the /workspace/source/scripts/ directory, organized by layer.

Best Practices

Always Preserve Bronze Data

Never delete or modify Bronze layer data. It serves as your source of truth for reprocessing and debugging.

Document Transformations

Clearly document all transformations between layers, including business logic and data quality rules.

Test at Each Layer

Implement data quality tests at Silver and Gold layers to catch issues before they reach analytics.

Use Appropriate Layer for Task

Query Bronze for debugging, Silver for ETL development, and Gold for all analytics and reporting.

Get Started

Architecture

Setup & Deployment

Data Pipeline

Data Quality

What is Medallion Architecture?

Architecture Layers

Bronze Layer: Raw Data Ingestion

Characteristics

Raw Data Storage

Single Source of Truth

Minimal Transformation

Data Preservation

Ingestion Process

Data Sources

Silver Layer: Cleansing & Standardization

Transformation Activities

Silver Tables

Quality Improvements

Gold Layer: Business-Ready Analytics

Purpose and Benefits

Optimized for Analytics

Business Terminology

Pre-Joined Data

Performance Optimized

Star Schema Components

Analytical Capabilities

Layer Comparison

Data Flow Between Layers

Best Practices

Build docs developers (and LLMs) love

Get Started

Architecture

Setup & Deployment

Data Pipeline

Data Quality

​What is Medallion Architecture?

​Architecture Layers

​Bronze Layer: Raw Data Ingestion

​Characteristics

Raw Data Storage

Single Source of Truth

Minimal Transformation

Data Preservation

​Ingestion Process

​Data Sources

​Silver Layer: Cleansing & Standardization

​Transformation Activities

​Silver Tables

​Quality Improvements

​Gold Layer: Business-Ready Analytics

​Purpose and Benefits

Optimized for Analytics

Business Terminology

Pre-Joined Data

Performance Optimized

​Star Schema Components

​Analytical Capabilities

​Layer Comparison

​Data Flow Between Layers

​Best Practices

Build docs developers (and LLMs) love

What is Medallion Architecture?

Architecture Layers

Bronze Layer: Raw Data Ingestion

Characteristics

Ingestion Process

Data Sources

Silver Layer: Cleansing & Standardization

Transformation Activities

Silver Tables

Quality Improvements

Gold Layer: Business-Ready Analytics

Purpose and Benefits

Star Schema Components

Analytical Capabilities

Layer Comparison

Data Flow Between Layers

Best Practices