Skip to main content

export_data()

Exports all processed data to both CSV and Excel formats.
def export_data(self)

Description

Convenience method that calls both export_csv_data() and export_excel_data() to generate all output files in a single operation.

Parameters

No parameters required.

Returns

No return value.

Output Files

Generates two files in the current working directory:
  1. processed_data.csv - Weekly metrics with pipe delimiter
  2. processed_data.xlsx - Monthly metrics with three sheets (JFK, Regular, Others)

Implementation Details

From source/main.py:146-148
def export_data(self):
    self.export_csv_data()
    self.export_excel_data()

Usage Example

# After processing all data
yellow_taxi_data.export_data()

# Output:
# - processed_data.csv (weekly aggregated metrics)
# - processed_data.xlsx (monthly metrics by rate code)
Ensure generate_week_metrics(), generate_month_metrics(), and format_data() have been called before exporting to ensure data is complete and properly formatted.

export_csv_data()

Exports weekly aggregated metrics to a CSV file.
def export_csv_data(self)

Description

Writes the csv_df DataFrame containing weekly metrics to a pipe-delimited CSV file.

Parameters

No parameters required.

Returns

No return value.

Output File Details

Filename: processed_data.csv Location: Current working directory Delimiter: Pipe character (|) Index: Not included in output Columns:
year_week
str
Week identifier in ‘YYYY-WWW’ format
min_trip_time
float
Minimum trip time in seconds (rounded to 2 decimals)
max_trip_time
float
Maximum trip time in seconds (rounded to 2 decimals)
mean_trip_time
float
Average trip time in seconds (rounded to 2 decimals)
min_trip_distance
float
Minimum trip distance in miles (rounded to 2 decimals)
max_trip_distance
float
Maximum trip distance in miles (rounded to 2 decimals)
mean_trip_distance
float
Average trip distance in miles (rounded to 2 decimals)
min_trip_amount
float
Minimum trip amount in dollars (rounded to 2 decimals)
max_trip_amount
float
Maximum trip amount in dollars (rounded to 2 decimals)
mean_trip_amount
float
Average trip amount in dollars (rounded to 2 decimals)
total_services
int
Total number of trips for the week
percentage_variation
float
Week-over-week percentage change in total services (rounded to 2 decimals)

Implementation Details

From source/main.py:134-135
def export_csv_data(self):
    self.csv_df.to_csv('processed_data.csv', sep='|', index=False)

Example Output

year_week|min_trip_time|max_trip_time|mean_trip_time|min_trip_distance|max_trip_distance|mean_trip_distance|min_trip_amount|max_trip_amount|mean_trip_amount|total_services|percentage_variation
2022-001|60.0|7245.0|876.34|0.01|45.67|3.42|2.8|487.5|18.76|152340|
2022-002|60.0|6832.0|891.22|0.01|43.21|3.38|2.8|456.3|18.54|149872|-1.62
The pipe delimiter (|) is used instead of comma to avoid conflicts with decimal values and potential commas in data fields.

export_excel_data()

Exports monthly metrics to an Excel file with multiple sheets.
def export_excel_data(self)

Description

Writes monthly metrics segmented by rate code type to a multi-sheet Excel workbook. Each rate code category (JFK, Regular, Others) is exported to a separate sheet.

Parameters

No parameters required.

Returns

No return value.

Output File Details

Filename: processed_data.xlsx Location: Current working directory Engine: OpenPyXL Index: Not included in output Sheets:
  1. JFK - Metrics for JFK airport trips (RatecodeID = 2)
  2. Regular - Metrics for standard rate trips (RatecodeID = 1)
  3. Others - Metrics for all other rate codes

Sheet Structure

All sheets contain the same columns:
year_month
str
Month identifier in ‘YYYY-MM’ format
day_type
int
1 for weekdays (Monday-Friday), 2 for weekends (Saturday-Sunday)
services
int
Total number of trips for the month and day type
distances
float
Sum of all trip distances in miles for the month and day type
passengers
int
Sum of all passengers transported for the month and day type

Implementation Details

From source/main.py:138-143
def export_excel_data(self):
    common_columns = ['year_month', 'day_type', 'services', 'distances', 'passengers']
    with pd.ExcelWriter("processed_data.xlsx", engine="openpyxl") as writer:
        self.jfk_df[common_columns].to_excel(writer, sheet_name="JFK", index=False)
        self.regular_df[common_columns].to_excel(writer, sheet_name="Regular", index=False)
        self.other_df[common_columns].to_excel(writer, sheet_name="Others", index=False)

Usage Example

# Export only Excel file
yellow_taxi_data.export_excel_data()

# Result: processed_data.xlsx with three sheets

Excel Sheet Preview

JFK Sheet Example:
year_monthday_typeservicesdistancespassengers
2022-01145623567823.4578934
2022-01212456154321.6721543
2022-02147821589234.1282341
2022-02213124162345.8922678
OpenPyXL must be installed in your environment. Install with: pip install openpyxl
Each sheet represents a different rate code category, allowing easy comparison of metrics across trip types and time periods.

Build docs developers (and LLMs) love