sql/queries.sql that demonstrate advanced PostgreSQL features like window functions, CTEs, and statistical aggregations.
Query Categories
These queries provide insights into:- Movie duration trends by decade
- Rating variability and distribution
- Comparison between IMDb and Metascore ratings
- Year-based ranking of top movies
1. Top Decades by Average Duration
This query identifies which decades produced the longest movies on average.SQL Query
How It Works
- Decade Calculation:
(year / 10) * 10converts years to decades- Example: 1994 → (1994 / 10) * 10 = 199 * 10 = 1990
- Aggregation:
AVG(duration_minutes)calculates mean duration per decade - Rounding:
ROUND(...::numeric, 2)formats to 2 decimal places - Sorting: Orders by longest average duration first
Use Cases
- Analyze historical trends in movie length
- Identify eras of epic cinema vs. shorter films
- Support film studies and industry analysis
2. Rating Standard Deviation by Year
Measures the variability of IMDb ratings within each year to identify years with consensus vs. divisive movies.SQL Query
How It Works
- Filter:
WHERE rating IS NOT NULLexcludes movies without ratings - Group: Aggregates by year
- Count:
COUNT(*)shows how many rated movies per year - Standard Deviation:
STDDEV(rating)measures rating dispersion - Precision: Rounds to 3 decimal places for readability
Interpretation
- Low stddev (< 0.5): Year had movies with similar ratings (consensus)
- High stddev (> 1.0): Year had wide range from excellent to poor movies
- Useful for understanding quality consistency by year
3. IMDb vs Metascore Comparison
Detects movies where public opinion (IMDb) significantly differs from critical opinion (Metascore).SQL Query
How It Works
- Normalization:
metascore / 10.0converts 0-100 scale to 0-10 scale - Absolute Difference:
ABS(rating - metascore / 10.0)calculates gap - Percentage Difference: Divides by rating and multiplies by 100
- Filter: Only shows movies with > 20% difference
- NULL Handling: Excludes movies missing either rating
Use Cases
- Find “critic-proof” movies loved by audiences but panned by critics
- Identify “critic darlings” that didn’t resonate with general audiences
- Analyze discrepancies between professional and user reviews
Example Results
| Title | IMDb Rating | Metascore (0-10) | Difference | % Difference |
|---|---|---|---|---|
| Blockbuster X | 8.5 | 5.0 | 3.5 | 41.18% |
| Indie Film Y | 6.2 | 8.5 | 2.3 | 37.10% |
4. Movie Rankings by Year (Window Functions)
Uses window functions to rank movies within each year, demonstratingPARTITION BY and ORDER BY clauses.
SQL Query
Window Function Breakdown
RANK() OVER (PARTITION BY year ORDER BY rating DESC)
- RANK(): Assigns ranking with gaps for ties
- Example: If two movies tie for #1, next is #3 (not #2)
- PARTITION BY year: Creates separate ranking groups per year
- Resets ranking for each new year
- ORDER BY rating DESC: Ranks from highest to lowest rating
How It Works
- Window function creates partitions for each year
- Within each partition, movies are sorted by rating (descending)
- RANK() assigns position within that year’s ranking
- Final ORDER BY arranges chronologically by year, then by rank
Use Cases
- Generate “Best of [Year]” lists automatically
- Compare top movies across different years
- Filter to show only top N movies per year:
Alternative Window Functions
- ROW_NUMBER(): No gaps in ranking (1, 2, 3…) even with ties
- DENSE_RANK(): Gaps for ties but consecutive numbers (1, 1, 2, 3…)
- NTILE(n): Divides into n equal buckets (quartiles, percentiles, etc.)
Window Functions in Action
The index created in the schema optimizes these queries:- PARTITION BY year (groups by year)
- ORDER BY rating DESC (sorts within partitions)
Performance Considerations
Extending the Queries
These queries can be extended with:-
CTEs (Common Table Expressions): Break complex queries into readable steps
-
Multiple Window Functions: Combine different analytics
Related Pages
Database Schema
Complete schema with tables, indexes, and constraints
CSV Export
Export query results to CSV for external analysis