The Gold layer implements a star schema - a dimensional modeling approach that organizes data into fact and dimension tables. This design pattern is optimized for analytical queries and business intelligence.
Star schemas are named for their visual appearance: a central fact table connected to multiple dimension tables, resembling a star.
The fact table is implemented as a view in ~/workspace/source/scripts/gold/ddl_gold.sql:
create view gold.fact_sales asselect si.sls_ord_num as order_number, pr.product_key, cu.customer_key, si.sls_ord_dt as order_date, si.sls_ship_dt as shipping_date, si.sls_sales as sales_amount, si.sls_quantity as quantity, si.sls_price as pricefrom silver.crm_sales_details sileft join gold.dim_product pr on si.sls_prd_key = pr.product_numberleft join gold.dim_customer cu on si.sls_cust_id = cu.customer_id;
Customer dimension for demographic and segmentation analysis.
View Definition
create view gold.dim_customer asselect ROW_NUMBER() over (order by cst_id) as customer_key, ci.cst_id as customer_id, ci.cst_key as customer_number, ci.cst_firstname as first_name, ci.cst_lastname as last_name, ci.cst_marital_status as marital_status, case when ci.cst_gndr !='n/a' then ci.cst_gndr else coalesce(ca.gen, 'n/a') end as gender, ca.bdate as birth_date, ci.cst_create_date as create_datefrom silver.crm_cust_info cileft join silver.erp_cust_az12 ca on ci.cst_key = ca.cidleft join silver.erp_loc_a101 la on ci.cst_key = la.cid;
Date the customer record was created. Track customer acquisition and tenure.
The gender field implements data quality logic: it uses the CRM value unless it’s ‘n/a’, then falls back to the ERP source using COALESCE. See gold/ddl_gold.sql:10-12.
Product dimension for product performance and category analysis.
View Definition
create view gold.dim_product asselect row_number() over (order by pn.prd_start_dt, pn.prd_key) as product_key, pn.prd_id as product_id, pn.prd_key as product_number, pn.prd_name as product_name, pn.cat_id as category_id, pc.cat as category, pc.subcat as subcategory, pc.manteinance, pn.prd_cost as product_cost, pn.prd_line as product_line, pn.prd_start_dt as product_start_datefrom silver.crm_prd_info pnleft join silver.erp_px_cat_g1v2 pc on pn.cat_id = pc.idwhere pn.prd_end_dt is null;
Maintenance flag indicating if the product requires ongoing maintenance or support.
The dim_product view filters to only active products using WHERE prd_end_dt IS NULL (see gold/ddl_gold.sql:41). Discontinued products are excluded from the dimension.
The star schema enables a wide range of analytical queries:
Customer Segmentation Analysis
Analyze sales patterns by customer demographics:
SELECT c.gender, c.marital_status, COUNT(DISTINCT c.customer_key) as customer_count, SUM(f.sales_amount) as total_sales, AVG(f.sales_amount) as avg_order_valueFROM gold.fact_sales fJOIN gold.dim_customer c ON f.customer_key = c.customer_keyGROUP BY c.gender, c.marital_statusORDER BY total_sales DESC;
Product Performance Analysis
Track product and category performance metrics:
SELECT p.category, p.subcategory, COUNT(DISTINCT f.order_number) as order_count, SUM(f.quantity) as units_sold, SUM(f.sales_amount) as revenue, AVG(f.price) as avg_priceFROM gold.fact_sales fJOIN gold.dim_product p ON f.product_key = p.product_keyGROUP BY p.category, p.subcategoryORDER BY revenue DESC;
Sales Trends Over Time
Examine sales trends and seasonality:
SELECT DATE_TRUNC('month', f.order_date) as month, SUM(f.sales_amount) as monthly_sales, COUNT(DISTINCT f.order_number) as order_count, COUNT(DISTINCT f.customer_key) as unique_customersFROM gold.fact_sales fGROUP BY DATE_TRUNC('month', f.order_date)ORDER BY month;
Customer Lifetime Value
Calculate customer lifetime metrics:
SELECT c.customer_key, c.first_name || ' ' || c.last_name as customer_name, COUNT(f.order_number) as total_orders, SUM(f.sales_amount) as lifetime_value, AVG(f.sales_amount) as avg_order_value, MIN(f.order_date) as first_order_date, MAX(f.order_date) as last_order_dateFROM gold.fact_sales fJOIN gold.dim_customer c ON f.customer_key = c.customer_keyGROUP BY c.customer_key, c.first_name, c.last_nameORDER BY lifetime_value DESCLIMIT 100;
All dimension tables use surrogate keys (integer sequences) rather than natural keys:
Benefits
Faster joins compared to multi-column or string natural keys
Insulates fact tables from changes to source system keys
Simplifies relationships and improves query performance
Enables consistent key type across all dimensions
Natural keys (customer_id, product_id) are retained as attributes for lineage and debugging.
Gold layer objects are implemented as views rather than materialized tables:
Rationale
Simplifies ETL: no need to manage insert/update logic
Always reflects latest Silver layer data
Reduces storage requirements
Easier to modify and evolve as requirements change
Consideration
For very large datasets, consider materializing views as tables with scheduled refresh for better query performance.
Product categories are denormalized directly into dim_product:
Why?
Eliminates additional joins for common queries
Category changes are infrequent, reducing update complexity
Improves query performance for category-based analysis
Simplifies query writing for analysts
The trade-off is potential redundancy, but this is acceptable for analytical workloads prioritizing read performance.
dim_product filters to active products only (no end date):
Benefits
Focuses analysis on current product portfolio
Reduces dimension size and improves performance
Prevents historical products from cluttering reports
Historical Note
This design choice aligns with project requirements to focus on current state. For historical analysis across product lifecycles, you would need a Type 2 slowly changing dimension approach.