Overview
The database schema consists of:- movies - Stores movie metadata from IMDb
- actors - Stores unique actor names
- movie_actor - Junction table for the N:M relationship
Entity-Relationship Model
The schema implements a many-to-many (N:M) relationship between movies and actors:- One movie can have multiple actors
- One actor can appear in multiple movies
- The
movie_actortable serves as a junction/bridge table
Tables
movies
Stores core movie information extracted from IMDb.| Column | Type | Constraints | Description |
|---|---|---|---|
id | SERIAL | PRIMARY KEY | Auto-incrementing unique identifier |
imdb_id | VARCHAR(15) | UNIQUE, NOT NULL | Official IMDb ID (e.g., tt0111161) |
title | TEXT | NOT NULL, length > 0 | Movie title |
year | INT | 1888 to current year + 1 | Release year |
rating | FLOAT | 0.0 to 10.0 | IMDb user rating |
duration_minutes | INT | 1 to 600 | Runtime in minutes |
metascore | INT | 0 to 100 | Metacritic score |
- Year must be between 1888 (first film) and next year
- Rating must be between 0 and 10
- Duration must be reasonable (1-600 minutes, up to 10 hours)
- Metascore must be between 0 and 100
actors
Stores unique actor names.| Column | Type | Constraints | Description |
|---|---|---|---|
id | SERIAL | PRIMARY KEY | Auto-incrementing unique identifier |
name | TEXT | UNIQUE, NOT NULL, length > 1 | Actor’s full name |
- Name must be unique across all actors
- Name must have at least 2 characters
movie_actor
Junction table implementing the N:M relationship between movies and actors.| Column | Type | Constraints | Description |
|---|---|---|---|
movie_id | INT | PRIMARY KEY (composite), FOREIGN KEY | References movies(id) |
actor_id | INT | PRIMARY KEY (composite), FOREIGN KEY | References actors(id) |
- Composite primary key ensures each movie-actor pair is unique
- Foreign keys with CASCADE DELETE ensure referential integrity
- When a movie is deleted, all related actor associations are removed
- When an actor is deleted, all related movie associations are removed
Indexes
The schema includes a composite index for performance optimization:- Speeds up queries filtering or sorting by year and rating
- Optimizes common analytical queries (e.g., “top movies by year”)
- Supports queries in
sql/queries.sqlthat partition by year
Complete Schema SQL
Here’s the complete schema definition fromsql/01_schema.sql:
Database Functions
The schema is complemented by upsert functions insql/02_procedures.sql that handle INSERT … ON CONFLICT operations:
- upsert_movie() - Inserts or ignores movies based on
imdb_id - upsert_actor() - Inserts or ignores actors based on
name - upsert_movie_actor() - Inserts movie-actor relationships without duplicates
Database Views
The schema includes views insql/03_views.sql to simplify common queries:
vw_movie_actor_main
A convenience view that shows each movie with its primary actor (the actor with the lowest ID):Related Pages
Analytical Queries
Advanced SQL queries with window functions and CTEs
CSV Export
Exporting and loading data from CSV files