Skip to main content

Yellow Taxi NYC Data Analytics

Process and analyze NYC Yellow Taxi trip data with comprehensive metrics generation and multi-format export capabilities.

Quick Start

Get up and running in minutes with your first data analysis

Installation Guide

Set up your Python environment and install dependencies

Data Model

Understand the NYC taxi trip data structure

API Reference

Explore the complete YellowTaxiData API

Key Features

Automated Import

Download and process parquet files directly from NYC TLC cloud storage

Data Cleaning

Comprehensive validation with business rules for trip duration, distance, and speed

Weekly Metrics

Statistical analysis including min, max, mean for time, distance, and amount

Monthly Metrics

Aggregated metrics by rate code type (Regular, JFK, Other) and day type

Multi-Format Export

Export results to CSV and Excel with formatted sheets

High Performance

Optimized processing with pandas and pyarrow for millions of records

What You Can Do

1

Import NYC Yellow Taxi Data

Automatically download trip data from the NYC TLC cloud for any date range in 2022
2

Clean and Validate

Apply business rules to filter invalid trips based on duration, speed, distance, and amounts
3

Generate Insights

Create weekly and monthly aggregated metrics with statistical analysis
4

Export Results

Export processed data to CSV (pipe-delimited) and Excel with separate sheets by rate code

Use Cases

  • Trip Analysis: Analyze taxi trip patterns, durations, and distances across different time periods
  • Revenue Insights: Track total amounts, variation percentages, and service counts by week
  • Rate Code Comparison: Compare Regular vs JFK vs Other rate codes for weekday/weekend patterns
  • Data Quality: Clean and validate large datasets with customizable business rules
  • Reporting: Generate formatted reports in CSV and Excel for stakeholder consumption

Build docs developers (and LLMs) love