Overview
Design a personal finance management system like Mint.com that automatically categorizes transactions, analyzes spending, and provides budget recommendations. This problem explores financial data integration, transaction processing, categorization algorithms, and real-time budget tracking.Step 1: Use Cases and Constraints
Use Cases
In Scope
- User connects to a financial account
- Service extracts transactions from the account
- Updates daily
- Categorizes transactions
- Allows manual category override by user
- No automatic re-categorization
- Analyzes monthly spending by category
- Service recommends a budget
- Allows users to manually set budget
- Sends notifications when approaching or exceeding budget
- Service has high availability
Out of Scope
- Additional logging and analytics
Constraints and Assumptions
Assumptions:- Traffic is not evenly distributed
- Automatic daily update applies only to users active in past 30 days
- Adding/removing financial accounts is relatively rare
- Budget notifications don’t need to be instant
- 10 million users
- 10 budget categories per user = 100 million budget items
- Example categories: Housing (200), Gas ($100)
- Sellers determine transaction category
- 50,000 sellers
- 30 million financial accounts
- 5 billion transactions per month
- 500 million read requests per month
- 10:1 write to read ratio (write-heavy)
Usage Calculations
Back-of-the-envelope calculations
Back-of-the-envelope calculations
Size per transaction:
user_id- 8 bytescreated_at- 5 bytesseller- 32 bytesamount- 5 bytes- Total: ~50 bytes
- 250 GB of new transaction content per month
- 50 bytes per transaction × 5 billion transactions per month
- 9 TB of new transaction content in 3 years
- 2,000 transactions per second on average
- 200 read requests per second on average
- 2.5 million seconds per month
- 1 request per second = 2.5 million requests per month
- 40 requests per second = 100 million requests per month
- 400 requests per second = 1 billion requests per month
Step 2: High Level Design
Step 3: Core Components
Use Case: User Connects Financial Account
Database Schema
Accounts table:id, user_id, and created_at for fast lookups and in-memory data.
REST API:
Use Case: Service Extracts Transactions
Extract transactions when:- User first links account
- User manually refreshes
- Automatically each day for active users (past 30 days)
Transaction Extraction Service processes
The service:
- Pulls job from Queue
- Extracts transactions from financial institution
- Stores raw logs in Object Store
- Uses Category Service to categorize transactions
- Uses Budget Service to calculate monthly spending
- Budget Service uses Notification Service if approaching/exceeding budget
- Updates SQL Database
transactionsandmonthly_spendingtables - Notifies user via Notification Service
Transaction Schema
Transactions table:id and user_id for both tables.
Category Service
Seed a seller-to-category dictionary with popular sellers:- 50,000 sellers × ~255 bytes = ~12 MB (fits in memory)
Category Service implementation
Category Service implementation
Use Case: Service Recommends Budget
Use generic budget templates based on income tiers. Only store user overrides inbudget_overrides table.
Budget Service implementation
Budget Service implementation
Generating Monthly Spending with MapReduce
Instead of SQL queries, run MapReduce on raw transaction files:MapReduce implementation
MapReduce implementation
Log file format (tab delimited):MapReduce implementation:
Step 4: Scale the Design
Additional Use Case: User Accesses Summaries
Check Memory Cache
Read API checks Memory Cache (Redis/Memcached) for:
- User sessions
- Aggregate stats by category
- Recent transactions
Scaling Components
DNS
Route users to nearest data center
CDN
Serve static content with low latency
Load Balancer
Distribute traffic across servers
Web Servers
Reverse proxies for horizontal scaling
API Servers
Separate Read/Write APIs
Memory Cache
Handle 200 average reads/second
- User sessions
- Aggregate stats
- Recent transactions
SQL Database
Master-Slave with read replicas
- 2,000 writes/second requires scaling
Analytics Database
Data warehouse (Redshift/BigQuery) for
monthly_spendingDatabase Scaling Strategies
SQL Scaling Patterns:- Federation - Split databases by function
- Sharding - Distribute data across databases
- Denormalization - Optimize read performance
- SQL Tuning - Optimize queries and indexes
- NoSQL - Move appropriate data to NoSQL
- Store only one month of transactions in SQL Database
- Archive rest in data warehouse or Object Store
- S3 easily handles 250 GB/month constraint
Implementation Reference
Python Implementation
View the complete Python implementation including categorization and MapReduce logic.
Related Topics
NoSQL Options
- Key-value store
- Document store
- Wide column store
- SQL vs NoSQL tradeoffs
Caching Strategies
- Cache-aside
- Write-through
- Write-behind
- Refresh ahead
Asynchronous Processing
- Message queues
- Task queues
- Back pressure
- Microservices
Communication Patterns
- REST for external APIs
- RPC for internal services
- Service discovery
Key Takeaways
- Asynchronous processing with queues handles transaction extraction
- Category Service uses in-memory dictionary + crowdsourcing
- MapReduce generates spending aggregates from raw logs
- Budget templates reduce storage (only store overrides)
- Memory Cache serves frequent reads (sessions, stats, transactions)
- Write-heavy workload (10:1) requires SQL scaling patterns
- Data warehousing for analytics (Redshift/BigQuery)
- Object Store archives old transactions
