Overview
This page documents all required dependencies for running the ecommerce linear regression analysis. The project uses Python data science libraries for data manipulation, exploratory data analysis, visualization, and machine learning.Python Version
This project requires Python 3.7 or higher.Required Libraries
Core Data Science Libraries
Library Details
pandas
- Purpose: Data manipulation and analysis
- Usage: Loading the ecommerce customers CSV file, data exploration, and preprocessing
- Version: Latest stable version recommended
ydata_profiling
- Purpose: Automated exploratory data analysis
- Usage: Generating comprehensive ProfileReport for the dataset
- Key Feature: Creates detailed statistical summaries and visualizations
- Note: Previously known as
pandas-profiling
matplotlib.pyplot
- Purpose: Data visualization and plotting
- Usage: Creating custom visualizations for the analysis
- Version: Compatible with seaborn
seaborn
- Purpose: Statistical data visualization
- Usage: Enhanced visualizations built on matplotlib
- Features: Better default aesthetics and statistical plotting capabilities
scikit-learn
- Purpose: Machine learning library
- Components Used:
LinearRegression: Building the regression modeltrain_test_split: Splitting data into training and testing setsmean_squared_error: Evaluating model performancer2_score: Calculating R-squared metric
Installation
Using pip
Using conda
The
ydata-profiling package is best installed via pip even in conda environments.Data File Requirements
Dataset Location
The analysis expects the dataset at:Dataset Structure
The CSV file should contain the following columns:| Column Name | Data Type | Description |
|---|---|---|
| String | Customer’s email address | |
| Address | String | Customer’s physical address |
| Avatar | String | Customer’s avatar color |
| Avg. Session Length | Float | Average session length in minutes |
| Time on App | Float | Time spent on mobile app in minutes |
| Time on Website | Float | Time spent on website in minutes |
| Length of Membership | Float | Membership duration in years |
| Yearly Amount Spent | Float | Annual spending by customer |
Dataset Source
The dataset is available from Kaggle:- URL: https://www.kaggle.com/datasets/leilaaliha/ecommerce-customers/data
- Records: 500 customers
- Format: CSV (Comma-Separated Values)
Verification
After installing the dependencies, verify the installation by running:Troubleshooting
Import errors with ydata_profiling
Import errors with ydata_profiling
If you encounter import errors, ensure you’re using the correct package name:
- Old name:
pandas-profiling - New name:
ydata-profiling
Matplotlib display issues in Jupyter
Matplotlib display issues in Jupyter
If plots don’t display in Jupyter notebooks, add this magic command at the top of your notebook:
Version compatibility issues
Version compatibility issues
If you experience compatibility issues between libraries, try creating a fresh virtual environment:
Next Steps
Once all dependencies are installed and verified:- Download the dataset from Kaggle
- Place it in the
data/directory - Proceed to the Code Walkthrough to understand the analysis implementation