Data quality testing ensures that data transformations produce accurate, complete, and consistent results across all layers of the data warehouse. Our testing strategy focuses on validating data integrity at both the Silver (cleaned and standardized) and Gold (dimensional model) layers.
All quality tests are designed to return zero rows when data quality is satisfactory. Any results indicate quality issues that need investigation.
Ensures no duplicate or null customer IDs exist in the system.
-- Check for duplicate or null customer IDs-- expected: no resultsSELECT cst_id, COUNT(*) FROM silver.crm_cust_infoGROUP BY cst_idHAVING COUNT(*) > 1 OR cst_id IS NULL;
Whitespace Validation
Verifies that customer data fields are properly trimmed and free of leading/trailing spaces.
-- Check for unwanted spaces in customer keys-- Expected: no resultsSELECT cst_key FROM silver.crm_cust_infoWHERE cst_key != TRIM(cst_key);-- Check for unwanted spaces in customer names-- expected: no resultsSELECT cst_firstnameFROM silver.crm_cust_infoWHERE cst_firstname != TRIM(cst_firstname); SELECT cst_lastnameFROM silver.crm_cust_infoWHERE cst_lastname != TRIM(cst_lastname);
Data Standardization
Validates that categorical values are standardized to expected values.
-- Check for NULLs or duplicates in Primary Key-- expected: no resultsSELECT prd_id, COUNT(*) FROM silver.crm_prd_infoGROUP BY prd_idHAVING COUNT(*) > 1 OR prd_id IS NULL;
Data Cleanliness
Validates product names are properly trimmed.
-- Check for unwanted spaces in product names-- expected: no resultsSELECT prd_nameFROM silver.crm_prd_infoWHERE prd_name != TRIM(prd_name);
Business Rule Validation
Ensures product costs are valid (non-negative and non-null).
-- Check for NULLs or Negative values in product cost-- expected: no resultsSELECT prd_costFROM silver.crm_prd_infoWHERE prd_cost < 0 OR prd_cost IS NULL;
Temporal Integrity
Validates that product date ranges are logically consistent.
-- Check for Invalid Dates Order-- expected: no resultsSELECT *FROM silver.crm_prd_infoWHERE prd_end_dt < prd_start_dt;
Ensures dates are in valid format and within reasonable ranges.
-- Check for Invalid Dates-- expected: no resultsSELECT COALESCE(sls_due_dt, 0) AS sls_due_dtFROM bronze.crm_sales_detailsWHERE sls_due_dt <= 0 OR LENGTH((sls_due_dt::TEXT)) != 8 OR sls_due_dt > 20500101 OR sls_due_dt < 19000101;
Date Logic Validation
Validates that date sequences make logical sense (order before ship/due).
-- Check for Invalid Date Orders (Order Date > Shipping/Due Dates)-- Expected: No ResultsSELECT *FROM silver.crm_sales_detailsWHERE sls_ord_dt > sls_ship_dt OR sls_ord_dt > sls_due_dt;
Mathematical Consistency
Ensures sales calculations are accurate (sales = quantity × price).
-- Check Data Consistency: sales = Quantity * Price-- Expected: No ResultsSELECT sls_sales, sls_quantity, sls_priceFROM silver.crm_sales_detailsWHERE sls_sales != sls_quantity * sls_price OR sls_sales IS NULL OR sls_quantity IS NULL OR sls_price IS NULL OR sls_quantity <= 0 OR sls_price <= 0 OR sls_sales <= 0ORDER BY sls_sales, sls_quantity, sls_price;
Validates that all foreign keys in fact tables reference existing dimension records.
-- Foreign key integrity checks-- Expected: 0 rows (all keys should match)SELECT *FROM gold.fact_sales f LEFT JOIN gold.dim_customer cu ON cu.customer_key = f.customer_keyLEFT JOIN gold.dim_product p ON p.product_key = f.product_keyWHERE cu.customer_key IS NULL OR p.product_key IS NULL;
This test ensures data integrity across the star schema. Any results indicate orphaned records that need investigation.