Interactive data transformation walkthrough using the MotherDuck UI
This walkthrough demonstrates data transformation and analysis using the MotherDuck UI. The tutorial uses a wine list dataset to showcase common data cleaning, transformation, and analytical queries.
This walkthrough is designed to be run one query at a time from the MotherDuck UI. While you can run these queries from the DuckDB CLI, the UI provides a better interactive experience.
Exercise Goal: Find the difference in price per bottle between:
The highest rated, least expensive bottle
The lowest rated, most expensive bottle
For bottles with a vintage of 1990 or later.Rating calculation: Use coalesce(coalesce("WA score","Vinous score"),-1)Price calculation: Use offer_price / qty as price_per_bottle
volume_cl: Volume in centiliters (e.g., “75” from “6x75cl”)
6
Calculate price per bottle
Transform the price field and calculate normalized prices:
SELECT * EXCLUDE("Unit size"), CAST(SUBSTR("Unit size", 1, INSTR("Unit size", 'x') - 1) AS INTEGER) as qty, CAST( SUBSTR( "Unit size", INSTR("Unit size", 'x') + 1, LENGTH("Unit size") - INSTR("Unit size", 'x') - 2 ) AS INTEGER ) as volume_cl, CAST( REPLACE(REPLACE("Offer price", '$', ''), ',', '') AS DECIMAL(10, 2) ) as offer_price, offer_price / (qty * volume_cl) * 75 as price_per_75cl, offer_price / qty as price_per_bottleFROM winelistWHERE vintage > 1000;
This transformation:
Removes $ and , from prices
Converts to decimal format
Calculates price per standard 75cl bottle
Calculates price per bottle (regardless of size)
7
Create a clean view
Save the cleaned data for analysis:
CREATE OR REPLACE VIEW winelist_clean ASSELECT * EXCLUDE("Unit size"), CAST(SUBSTR("Unit size", 1, INSTR("Unit size", 'x') - 1) AS INTEGER) as qty, CAST( SUBSTR( "Unit size", INSTR("Unit size", 'x') + 1, LENGTH("Unit size") - INSTR("Unit size", 'x') - 2 ) AS INTEGER ) as volume_cl, CAST( REPLACE(REPLACE("Offer price", '$', ''), ',', '') AS DECIMAL(10, 2) ) as offer_price, offer_price / (qty * volume_cl) * 75 as price_per_75cl, offer_price / qty as price_per_bottleFROM winelistWHERE vintage > 1000;
8
Solve the exercise
Find the price difference between the best value and worst value bottles:
WITH cte_cheap_but_good AS ( SELECT * FROM winelist_clean WHERE Vintage >= 1990 AND ("WA score" IS NOT NULL OR "Vinous score" IS NOT NULL) ORDER BY COALESCE(COALESCE("WA score", "Vinous score"), -1) DESC, price_per_bottle LIMIT 1),cte_expensive_and_bad AS ( SELECT * FROM winelist_clean WHERE Vintage >= 1990 AND ("WA score" IS NOT NULL OR "Vinous score" IS NOT NULL) ORDER BY COALESCE(COALESCE("WA score", "Vinous score"), -1) ASC, price_per_bottle DESC LIMIT 1)SELECT a.price_per_bottle - b.price_per_bottle as difference FROM cte_cheap_but_good aJOIN cte_expensive_and_bad b ON true
This query:
Finds the highest-rated, cheapest bottle (1990+)
Finds the lowest-rated, most expensive bottle (1990+)
-- Use the first non-null value from multiple columnsCOALESCE(COALESCE("WA score", "Vinous score"), -1)-- Filter for rows with at least one scoreWHERE "WA score" IS NOT NULL OR "Vinous score" IS NOT NULL
WITH cte_cheap_but_good AS ( -- First subquery SELECT ...),cte_expensive_and_bad AS ( -- Second subquery SELECT ...)SELECT ... FROM cte_cheap_but_good aJOIN cte_expensive_and_bad b ON true
CTEs make complex queries more readable by breaking them into logical steps. They’re particularly useful for comparing different subsets of data.
Once you’ve cleaned the data, try these analytical queries:
SELECT "Wine Name", Vintage, COALESCE("WA score", "Vinous score") as rating, price_per_bottleFROM winelist_cleanWHERE "WA score" IS NOT NULL OR "Vinous score" IS NOT NULLORDER BY rating DESCLIMIT 10
Apply similar data cleaning techniques to other datasets
Build more complex analytical queries
Create materialized views for performance
Share your queries with team members
Remember that this example uses CREATE OR REPLACE TABLE which will overwrite any existing winelist table. In production environments, use more careful table management strategies.