Skip to main content

What is Medallion Architecture?

The Medallion Architecture is a data design pattern that organizes data into three progressive layers - Bronze, Silver, and Gold. Each layer serves a specific purpose in the data refinement process, transforming raw source data into analytics-ready datasets.
The medallion pattern provides clear separation of concerns, making data lineage transparent and enabling teams to work at different layers simultaneously.

Architecture Layers

The three layers work together to progressively refine data quality and structure:

Bronze Layer: Raw Data Ingestion

The Bronze layer serves as the landing zone for raw data from source systems. Data is ingested with minimal transformation to preserve the original state.

Characteristics

Raw Data Storage

Stores data exactly as received from ERP and CRM systems, maintaining complete fidelity to source systems.

Single Source of Truth

Serves as the authoritative record of what was received from source systems, enabling data lineage tracking.

Minimal Transformation

Only basic type conversions and schema validation are performed during ingestion from CSV files.

Data Preservation

Maintains historical snapshots of source data for audit, debugging, and reprocessing scenarios.

Ingestion Process

1

CSV File Reception

Source systems (ERP and CRM) export data as CSV files to a designated ingestion location.
2

Schema Validation

Validate that incoming CSV files match expected schema definitions before loading.
3

Data Loading

Load CSV data into PostgreSQL Bronze tables using COPY commands or ETL scripts.
4

Metadata Capture

Record ingestion timestamp, source file name, and row counts for tracking and auditing.

Data Sources

ERP System Data

Customer demographic data, product categories, and location information:
  • Customer profiles with birth dates and gender
  • Product categorization and maintenance flags
  • Geographic location mappings
Customer interactions, product details, and sales transactions:
  • Customer contact information and preferences
  • Product catalog with pricing and lifecycle dates
  • Sales order details with quantities and amounts
Bronze layer data may contain quality issues, duplicates, and inconsistencies inherited from source systems. Never query Bronze tables directly for analytics.

Layer Comparison

Understanding the differences between layers helps determine where to focus efforts:
AspectBronzeSilverGold
PurposeRaw data preservationData cleansing & prepAnalytics & reporting
Data QualityAs-is from sourceValidated & cleanedBusiness-ready
StructureSource system schemasNormalized tablesStar schema (denormalized)
UsersData engineersData engineersAnalysts, BI tools
QueryableFor debugging onlyYes, for ETLPrimary analytics interface
NamingSource system namesStandardized technicalBusiness-friendly

Data Flow Between Layers

Data progresses through the layers in a controlled pipeline:
Each layer transformation is implemented as SQL scripts and views in the /workspace/source/scripts/ directory, organized by layer.

Best Practices

Never delete or modify Bronze layer data. It serves as your source of truth for reprocessing and debugging.
Clearly document all transformations between layers, including business logic and data quality rules.
Implement data quality tests at Silver and Gold layers to catch issues before they reach analytics.
Query Bronze for debugging, Silver for ETL development, and Gold for all analytics and reporting.

Build docs developers (and LLMs) love