Monthly Metrics

Overview

The generate_month_metrics() method calculates monthly aggregated statistics segmented by rate code type (Regular, JFK, Other) and day type (weekday vs weekend). This provides insights into service patterns across different fare types and temporal categories.

How It Works

The method splits data into three categories based on RatecodeID, then aggregates by month and day type:

main.py

def generate_month_metrics(self):
    rate_code_id_dict = {
        'regular_df': 1,
        'jfk_df': 2,
        'other_df': -1
    }

    # Classify weekdays (1) vs weekends (2)
    self.data['day_type'] = np.where(
        self.data['tpep_dropoff_datetime'].dt.dayofweek >= 5, 
        2, 
        1
    )
    self.data['RatecodeID'] = self.data['RatecodeID'].astype(int)

    for rc_id in rate_code_id_dict.keys():
        attr = getattr(self, rc_id)

        if rate_code_id_dict[rc_id] in [1, 2]:
            df = self.data[self.data['RatecodeID'] == rate_code_id_dict[rc_id]]
        else:
            df = self.data[
                (self.data['RatecodeID'] != 1) & 
                (self.data['RatecodeID'] != 2)
            ]

        df = df[['year_month', 'day_type', 'trip_distance', 'passenger_count']]

        df = df.groupby(['year_month', 'day_type']).agg(
            services=('trip_distance', 'count'),
            distances=('trip_distance', 'sum'),
            passengers=('passenger_count', 'sum')
        ).reset_index()

        attr = pd.concat([attr, df])
        setattr(self, rc_id, attr)

Rate Code Categories

Regular (RatecodeID = 1)

Standard city trips with metered fares. This is the most common rate code, representing typical street hail and dispatch services within NYC. Stored in: self.regular_df

JFK (RatecodeID = 2)

Flat-fare trips to/from John F. Kennedy International Airport. These trips have a fixed $52 fare (plus tolls and tips) regardless of distance or time. Stored in: self.jfk_df

Other (RatecodeID ≠ 1 and ≠ 2)

All other rate codes including:

Newark trips (RatecodeID = 3)
Nassau/Westchester (RatecodeID = 4)
Negotiated fares (RatecodeID = 5)
Group rides (RatecodeID = 6)

Stored in: self.other_df

Day Type Classification

Trips are classified into two categories based on the dropoff datetime:

Weekday (day_type = 1): Monday through Friday
Weekend (day_type = 2): Saturday and Sunday

self.data['day_type'] = np.where(
    self.data['tpep_dropoff_datetime'].dt.dayofweek >= 5,
    2,  # Weekend (day 5=Saturday, 6=Sunday)
    1   # Weekday (days 0-4)
)

Day type is determined by the dropoff time, not pickup time. This ensures trips are categorized by when the service was completed.

Metrics Calculated

For each combination of rate code, month, and day type, three metrics are computed:

Services Count

services=('trip_distance', 'count')

Total number of completed trips in that category. This uses trip_distance for counting to ensure we only count trips with valid distance data.

Total Distances

distances=('trip_distance', 'sum')

Sum of all trip distances in miles for that category. Useful for understanding total service coverage and demand patterns.

Total Passengers

passengers=('passenger_count', 'sum')

Sum of all passengers transported in that category. Helps analyze capacity utilization and group travel patterns.

Output Organization

Results are organized into three separate DataFrames, each exported to its own Excel sheet:

Common Structure

All three DataFrames share the same column structure:

year_month | day_type | services | distances | passengers

Sheet Layout in Excel

The exported Excel file (processed_data.xlsx) contains three sheets:

JFK: Airport trips data
Regular: Standard city trips data
Others: All other rate codes data

main.py

def export_excel_data(self):
    common_columns = ['year_month', 'day_type', 'services', 'distances', 'passengers']
    with pd.ExcelWriter("processed_data.xlsx", engine="openpyxl") as writer:
        self.jfk_df[common_columns].to_excel(writer, sheet_name="JFK", index=False)
        self.regular_df[common_columns].to_excel(writer, sheet_name="Regular", index=False)
        self.other_df[common_columns].to_excel(writer, sheet_name="Others", index=False)

Example Output Data

Regular
JFK
Others

Standard metered trips - the largest volume of services:

year_month  day_type  services   distances    passengers
2022-01     1         2456789    8456234.56   3245678
2022-01     2         876543     2987654.32   1156789
2022-02     1         2389456    8234567.89   3189456
2022-02     2         845678     2876543.21   1123456
2022-03     1         2567891    8678945.67   3378912
2022-03     2         912345     3123456.78   1201234

Key Insights:

Weekday services are ~2.8x higher than weekends
Average trip distance: ~3.4 miles (weekday) vs ~3.4 miles (weekend)
Average passengers per trip: ~1.32 (relatively consistent)

Airport trips with flat-fare pricing:

year_month  day_type  services  distances    passengers
2022-01     1         45678     789456.78    56789
2022-01     2         38912     672345.67    48234
2022-02     1         43567     752891.23    54321
2022-02     2         36789     634567.89    45678
2022-03     1         47891     827345.12    59678
2022-03     2         40123     692834.56    49876

Key Insights:

More balanced weekday/weekend distribution than Regular
Average trip distance: ~17.3 miles (airport to Manhattan)
Slightly higher weekend passenger count (leisure travel)

All other rate codes (Newark, Nassau, negotiated, etc.):

year_month  day_type  services  distances    passengers
2022-01     1         12345     234567.89    15678
2022-01     2         8901      167890.12    11234
2022-02     1         11678     221234.56    14890
2022-02     2         8456      159876.34    10678
2022-03     1         13012     245678.91    16234
2022-03     2         9234      173456.78    11678

Key Insights:

Smallest volume category
Average trip distance: ~19 miles (longer trips to outer areas)
More common on weekdays (business travel)

Data Interpretation

Weekday vs Weekend Analysis

Compare day types to understand demand patterns:

import pandas as pd

# Load Regular sheet
regular_df = pd.read_excel('processed_data.xlsx', sheet_name='Regular')

# Calculate weekday vs weekend ratios
weekday_data = regular_df[regular_df['day_type'] == 1]
weekend_data = regular_df[regular_df['day_type'] == 2]

weekday_avg_services = weekday_data['services'].mean()
weekend_avg_services = weekend_data['services'].mean()

ratio = weekday_avg_services / weekend_avg_services
print(f"Weekday/Weekend Services Ratio: {ratio:.2f}x")

Rate Code Comparison

Compare service levels across rate codes:

import pandas as pd

# Load all sheets
regular_df = pd.read_excel('processed_data.xlsx', sheet_name='Regular')
jfk_df = pd.read_excel('processed_data.xlsx', sheet_name='JFK')
other_df = pd.read_excel('processed_data.xlsx', sheet_name='Others')

# Total services by rate code
regular_total = regular_df['services'].sum()
jfk_total = jfk_df['services'].sum()
other_total = other_df['services'].sum()

print(f"Regular: {regular_total:,} ({regular_total/(regular_total+jfk_total+other_total)*100:.1f}%)")
print(f"JFK: {jfk_total:,} ({jfk_total/(regular_total+jfk_total+other_total)*100:.1f}%)")
print(f"Other: {other_total:,} ({other_total/(regular_total+jfk_total+other_total)*100:.1f}%)")

Monthly Trends

Analyze growth patterns month-over-month:

import pandas as pd
import matplotlib.pyplot as plt

regular_df = pd.read_excel('processed_data.xlsx', sheet_name='Regular')

# Group by month (sum weekday + weekend)
monthly_services = regular_df.groupby('year_month')['services'].sum()

# Calculate month-over-month change
mom_change = monthly_services.pct_change() * 100

# Plot
plt.figure(figsize=(10, 6))
plt.bar(monthly_services.index, mom_change)
plt.xlabel('Month')
plt.ylabel('Month-over-Month Change (%)')
plt.title('Regular Rate Code - Monthly Service Growth')
plt.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
plt.tight_layout()
plt.show()

Use Cases

Airport Operations Planning

Analyze JFK trips to optimize airport taxi stand capacity:

jfk_df = pd.read_excel('processed_data.xlsx', sheet_name='JFK')

# Calculate average daily trips
jfk_weekday_daily = jfk_df[jfk_df['day_type'] == 1]['services'].mean() / 22  # ~22 weekdays/month
jfk_weekend_daily = jfk_df[jfk_df['day_type'] == 2]['services'].mean() / 8   # ~8 weekend days/month

print(f"Average weekday trips: {jfk_weekday_daily:.0f}/day")
print(f"Average weekend trips: {jfk_weekend_daily:.0f}/day")

Revenue Analysis

Estimate revenue by rate code category:

# Assumptions
AVG_REGULAR_FARE = 18.50
JFK_FLAT_FARE = 52.00
AVG_OTHER_FARE = 35.00

regular_revenue = regular_df['services'].sum() * AVG_REGULAR_FARE
jfk_revenue = jfk_df['services'].sum() * JFK_FLAT_FARE
other_revenue = other_df['services'].sum() * AVG_OTHER_FARE

total_revenue = regular_revenue + jfk_revenue + other_revenue

print(f"Estimated Total Revenue: ${total_revenue:,.2f}")
print(f"  Regular: ${regular_revenue:,.2f} ({regular_revenue/total_revenue*100:.1f}%)")
print(f"  JFK: ${jfk_revenue:,.2f} ({jfk_revenue/total_revenue*100:.1f}%)")
print(f"  Other: ${other_revenue:,.2f} ({other_revenue/total_revenue*100:.1f}%)")

Passenger Load Analysis

Calculate average passengers per trip by category:

for sheet_name in ['Regular', 'JFK', 'Others']:
    df = pd.read_excel('processed_data.xlsx', sheet_name=sheet_name)
    avg_passengers = df['passengers'].sum() / df['services'].sum()
    print(f"{sheet_name}: {avg_passengers:.2f} passengers/trip")

Best Practices

When analyzing monthly metrics, always consider both services count and total distances together. A month with fewer services but higher total distance indicates longer trips, which may have different operational and revenue implications.

Seasonality Considerations

Regular trips: Highest in fall/spring (commuter traffic)
JFK trips: Peak during summer and winter holidays
Other trips: More stable throughout the year

Data Quality Checks

Before analysis, validate:

# Check for missing months
expected_months = pd.date_range('2022-01', '2022-03', freq='MS').strftime('%Y-%m')
actual_months = regular_df['year_month'].unique()
missing = set(expected_months) - set(actual_months)

if missing:
    print(f"Warning: Missing months: {missing}")

# Check for missing day types (should have both 1 and 2 for each month)
for month in regular_df['year_month'].unique():
    day_types = regular_df[regular_df['year_month'] == month]['day_type'].unique()
    if len(day_types) < 2:
        print(f"Warning: {month} missing day type {set([1,2]) - set(day_types)}")

Next Steps

Learn about Exporting Results file formats
Review Weekly Metrics for time-series analysis
Return to the complete Processing Data workflow

Get Started

Core Concepts

User Guide

API Reference

Development

Overview

How It Works

Rate Code Categories

Regular (RatecodeID = 1)

JFK (RatecodeID = 2)

Other (RatecodeID ≠ 1 and ≠ 2)

Day Type Classification

Metrics Calculated

Services Count

Total Distances

Total Passengers

Output Organization

Common Structure

Sheet Layout in Excel

Example Output Data

Data Interpretation

Weekday vs Weekend Analysis

Rate Code Comparison

Monthly Trends

Use Cases

Airport Operations Planning

Revenue Analysis

Passenger Load Analysis

Best Practices

Seasonality Considerations

Data Quality Checks

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

User Guide

API Reference

Development

​Overview

​How It Works

​Rate Code Categories

​Regular (RatecodeID = 1)

​JFK (RatecodeID = 2)

​Other (RatecodeID ≠ 1 and ≠ 2)

​Day Type Classification

​Metrics Calculated

​Services Count

​Total Distances

​Total Passengers

​Output Organization

​Common Structure

​Sheet Layout in Excel

​Example Output Data

​Data Interpretation

​Weekday vs Weekend Analysis

​Rate Code Comparison

​Monthly Trends

​Use Cases

​Airport Operations Planning

​Revenue Analysis

​Passenger Load Analysis

​Best Practices

​Seasonality Considerations

​Data Quality Checks

​Next Steps

Build docs developers (and LLMs) love

Overview

How It Works

Rate Code Categories

Regular (RatecodeID = 1)

JFK (RatecodeID = 2)

Other (RatecodeID ≠ 1 and ≠ 2)

Day Type Classification

Metrics Calculated

Services Count

Total Distances

Total Passengers

Output Organization

Common Structure

Sheet Layout in Excel

Example Output Data

Data Interpretation

Weekday vs Weekend Analysis

Rate Code Comparison

Monthly Trends

Use Cases

Airport Operations Planning

Revenue Analysis

Passenger Load Analysis

Best Practices

Seasonality Considerations

Data Quality Checks

Next Steps