Explore the most important data models for data-intensive applications - relational, document, and graph - and learn how to choose the right model for your use case.
Data models are one of the most important parts of developing software, because they have such a profound effect on how we think about the problem we’re solving and how we write the software.Each layer hides the complexity of the layers below by providing a clean data model. These abstractions allow different groups of people to work together effectively.This chapter explores the most important data models for data-intensive applications:
Proposed by Edgar Codd in 1970, the relational model organizes data into relations (tables), where each relation is a collection of tuples (rows).Key characteristics:
Data organized in tables with fixed schema
Relationships represented by foreign keys
Queries use SQL (Structured Query Language)
ACID transactions for consistency
Example schema:
CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(100), email VARCHAR(100) UNIQUE, created_at TIMESTAMP DEFAULT NOW());CREATE TABLE posts ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id), title VARCHAR(200), content TEXT, created_at TIMESTAMP DEFAULT NOW());CREATE TABLE comments ( id SERIAL PRIMARY KEY, post_id INTEGER REFERENCES posts(id), user_id INTEGER REFERENCES users(id), text TEXT, created_at TIMESTAMP DEFAULT NOW());
Problem: Many entities reference the same data (e.g., many users in same city)Relational advantages for many-to-one:
No duplication: City name stored once
Consistency: Update city name in one place
Type safety: Can enforce valid cities
Easy updates: Rename city affects all users
Document challenges for many-to-one:
Duplication: “San Francisco” stored in many documents
Inconsistency risk: Some docs might have “SF”, others “San Francisco”
Hard to update: Must update all documents
# Relational: Easy updatedb.execute("UPDATE cities SET name = 'San Francisco Bay Area' WHERE id = 5")# All users automatically see new name# Document: Hard updatedb.users.update_many( {"city": "San Francisco"}, {"$set": {"city": "San Francisco Bay Area"}})# Must find and update all documents
Many-to-many relationships are even more challenging for documents, often requiring either denormalization with duplication or references (which reduce the benefits of document storage).
Schema-on-write vs schema-on-readExample: Adding a new fieldRelational (schema-on-write):
-- Must alter tableALTER TABLE users ADD COLUMN phone VARCHAR(20);-- For large tables, this can lock the table for hours!-- All existing rows get NULL for phone
Document (schema-on-read):
// Just start writing documents with new fielddb.users.insertOne({ name: "Charlie", email: "[email protected]", phone: "555-1234" // New field, no migration needed!})// Application must handle both old and new documentsconst user = db.users.findOne({_id: userId})const phone = user.phone || "Not provided" // Handle missing field
Use schema-on-write when all documents have the same structure and you need strong consistency guarantees. Use schema-on-read when data structure varies or you need rapid iteration with frequent schema changes.
SQL is the dominant declarative query language for relational databases.Common SQL patterns:
-- Basic querySELECT name, email FROM users WHERE age > 18;-- Join multiple tablesSELECT users.name, COUNT(posts.id) as post_countFROM usersLEFT JOIN posts ON posts.user_id = users.idGROUP BY users.id, users.nameHAVING COUNT(posts.id) > 5;-- SubquerySELECT * FROM postsWHERE user_id IN ( SELECT id FROM users WHERE location = 'SF');-- Window functionSELECT name, salary, AVG(salary) OVER (PARTITION BY department) as dept_avgFROM employees;
Graph databases excel at traversing relationships.Example queries:
// 1. Find all friends of AliceMATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(friend)RETURN friend.name;// 2. Find friends of friends (excluding direct friends and Alice herself)MATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->()-[:FRIENDS_WITH]->(foaf)WHERE NOT (alice)-[:FRIENDS_WITH]->(foaf) AND foaf <> aliceRETURN DISTINCT foaf.name;// 3. Find people who live in same city as AliceMATCH (alice:Person {name: 'Alice'})-[:LIVES_IN]->(city)<-[:LIVES_IN]-(person)WHERE person <> aliceRETURN person.name, city.name;// 4. Shortest path between two peopleMATCH path = shortestPath( (alice:Person {name: 'Alice'})-[:FRIENDS_WITH*]-(diana:Person {name: 'Diana'}))RETURN path;
The same queries in SQL require complex recursive queries:
-- Find friends of friends (painful in SQL!)WITH RECURSIVE friends_of_friends AS ( -- Base case: Direct friends SELECT friend_id, 1 as depth FROM friendships WHERE user_id = (SELECT id FROM users WHERE name = 'Alice') UNION -- Recursive case: Friends of those friends SELECT f.friend_id, fof.depth + 1 FROM friendships f JOIN friends_of_friends fof ON f.user_id = fof.friend_id WHERE fof.depth < 2)SELECT DISTINCT u.nameFROM friends_of_friends fofJOIN users u ON u.id = fof.friend_idWHERE fof.depth = 2 AND fof.friend_id NOT IN ( SELECT friend_id FROM friendships WHERE user_id = (SELECT id FROM users WHERE name = 'Alice') );
Graph databases excel at complex traversal queries, but they may not be the best choice for simple CRUD operations or when you need strong ACID transactions across the entire database.
# Banking application# - Strong ACID guarantees needed# - Complex transactions between accounts# - Structured financial reportsclass BankingApp: def transfer_money(self, from_account, to_account, amount): with db.transaction(): # Atomic: Both succeed or both fail db.execute( "UPDATE accounts SET balance = balance - %s WHERE id = %s", [amount, from_account] ) db.execute( "UPDATE accounts SET balance = balance + %s WHERE id = %s", [amount, to_account] ) db.execute( "INSERT INTO transactions (from, to, amount) VALUES (%s, %s, %s)", [from_account, to_account, amount] )
Use document when:
# Content management system# - Articles with varying structure# - Embedded comments and metadata# - Rapid iteration on schemaclass BlogApp: def create_article(self, article_data): # Flexible structure, can evolve over time db.articles.insert_one({ "title": article_data["title"], "author": article_data["author"], "content": article_data["content"], "tags": article_data.get("tags", []), "metadata": article_data.get("metadata", {}), "comments": [], # Embedded "versions": [], # Keep history "created_at": datetime.now() })
Use graph when:
# Social network# - Complex relationship queries# - Friend recommendations# - Influence analysisclass SocialApp: def recommend_friends(self, user_id): # Find friends of friends with common interests query = """ MATCH (user:Person {id: $user_id})-[:FRIENDS_WITH]->(friend) -[:FRIENDS_WITH]->(foaf) WHERE NOT (user)-[:FRIENDS_WITH]->(foaf) AND foaf <> user MATCH (user)-[:INTERESTED_IN]->(interest)<-[:INTERESTED_IN]-(foaf) RETURN foaf.name, COUNT(interest) as common_interests ORDER BY common_interests DESC LIMIT 10 """ return graph.execute(query, user_id=user_id)
Data models are perhaps the most important abstraction in software development.Key takeaways:
Start with the data model that most naturally fits your data
Many applications use multiple databases (polyglot persistence)
The data model shapes how you think about the problem
Consider access patterns, not just data structure
Schema flexibility vs. data integrity is a key trade-off
Quick reference: When to use each model
Relational: Structured data with complex queries and strong consistency requirementsDocument: Hierarchical or evolving data where related information is accessed togetherGraph: Highly connected data with complex relationship queries and variable-depth traversals