Overview
The application generates two types of metrics reports:- Weekly Metrics - Aggregated trip statistics by week
- Monthly Metrics by Rate Code - Service volumes by month, rate code, and day type
Weekly Metrics
Weekly metrics provide time-series analysis of trip characteristics aggregated by ISO week.Implementation
main.py:74-94
Calculated Fields
Trip Time Metrics
Trip Time Metrics
Derived Column:Aggregations:
main.py:75-76
min_trip_time- Shortest trip duration in the week (seconds)max_trip_time- Longest trip duration in the week (seconds)mean_trip_time- Average trip duration for the week (seconds)
Trip Distance Metrics
Trip Distance Metrics
Source Column:
trip_distance (miles)Aggregations:min_trip_distance- Shortest trip distance in the weekmax_trip_distance- Longest trip distance in the weekmean_trip_distance- Average trip distance for the week
Trip Amount Metrics
Trip Amount Metrics
Source Column:
total_amount (USD)Aggregations:min_trip_amount- Lowest fare in the weekmax_trip_amount- Highest fare in the weekmean_trip_amount- Average fare for the week
Service Volume
Service Volume
Aggregation:Use Case: Track demand trends, measure service capacity utilization, identify growth patterns.
total_services- Total number of trips completed in the week
main.py:88
Uses
count aggregation on total_amount column (any non-null column would work since data is already cleaned).Percentage Variation
The week-over-week change in service volume:main.py:91-93
- Week 1: 10,000 trips
- Week 2: 10,500 trips
- Percentage variation = ((10,500 - 10,000) / 10,000) × 100 = +5%
Interpretation
- Positive values indicate growth in service volume
- Negative values indicate decline in service volume
- First week will have NaN (no previous week to compare)
Grouping Key: year_week
Trips are grouped by ISO week format:main.py:69-70
YYYY-WWW (e.g., 2022-012 for week 12 of 2022)
Week numbers are zero-padded to 3 digits for consistent sorting.
Output Format
Exported toprocessed_data.csv with pipe delimiter:
main.py:135
Monthly Metrics by Rate Code
Monthly metrics segment trips by rate code type and day type (weekday vs. weekend).Implementation
main.py:96-123
Day Type Classification
main.py:103
1= Weekday (Monday through Friday)2= Weekend (Saturday and Sunday)
dayofweek >= 5means Saturday (5) or Sunday (6)dayofweek < 5means Monday (0) through Friday (4)
Why Segment by Day Type?Weekday and weekend trips have different characteristics:
- Weekdays: More commuter/business trips, rush hour patterns
- Weekends: More leisure/tourist trips, different time distributions
Rate Code Segmentation
Three separate DataFrames are created:Regular (RatecodeID = 1)
Regular (RatecodeID = 1)
main.py:109-110
self.regular_dfStandard metered trips - the majority of yellow taxi services.JFK Airport (RatecodeID = 2)
JFK Airport (RatecodeID = 2)
main.py:109-110
self.jfk_dfJFK airport flat-rate trips - analyzed separately due to unique pricing and distance patterns.Other (RatecodeID ≠ 1 and ≠ 2)
Other (RatecodeID ≠ 1 and ≠ 2)
main.py:111-112
self.other_dfAll other rate codes: negotiated fares, Nassau/Westchester, group rides, etc.Aggregated Metrics
For each rate code segment, trips are grouped by[year_month, day_type]:
main.py:116-120
Total number of trips for the month/day_type combination.Uses
count on trip_distance (any non-null column works).Total miles traveled for all trips in the month/day_type combination.Sum of all trip_distance values.
Total passengers served for all trips in the month/day_type combination.Sum of all passenger_count values.
Grouping Keys
year_month:main.py:67
YYYY-MM (e.g., 2022-01)
day_type:
1= Weekday2= Weekend
Output Format
Exported to Excel with separate sheets:main.py:138-143
processed_data.xlsx
Sheets:
JFK- JFK airport tripsRegular- Standard rate tripsOthers- All other rate codes
Data Formatting
Before export, metrics are formatted:main.py:126-131
- Weekly metrics rounded to 2 decimal places
- Monthly metrics indexes reset for clean export
Analysis Use Cases
Weekly Metrics
- Trend analysis over time
- Week-over-week growth tracking
- Seasonal pattern detection
- Service quality monitoring (avg duration/distance/fare)
Monthly Metrics
- Rate code performance comparison
- Weekday vs. weekend demand analysis
- Passenger volume trends
- Distance-based service characterization
- Revenue segment analysis
Combining Both MetricsWeekly metrics provide granular time-series data, while monthly metrics offer segmented business intelligence. Together, they enable comprehensive operational and strategic analysis.
Next Steps
- Learn about the Data Model structure
- Review Data Cleaning rules applied before metrics calculation