Skip to main content

Overview

This page documents all required dependencies for running the ecommerce linear regression analysis. The project uses Python data science libraries for data manipulation, exploratory data analysis, visualization, and machine learning.

Python Version

This project requires Python 3.7 or higher.

Required Libraries

Core Data Science Libraries

import pandas as pd

Library Details

pandas

  • Purpose: Data manipulation and analysis
  • Usage: Loading the ecommerce customers CSV file, data exploration, and preprocessing
  • Version: Latest stable version recommended

ydata_profiling

  • Purpose: Automated exploratory data analysis
  • Usage: Generating comprehensive ProfileReport for the dataset
  • Key Feature: Creates detailed statistical summaries and visualizations
  • Note: Previously known as pandas-profiling

matplotlib.pyplot

  • Purpose: Data visualization and plotting
  • Usage: Creating custom visualizations for the analysis
  • Version: Compatible with seaborn

seaborn

  • Purpose: Statistical data visualization
  • Usage: Enhanced visualizations built on matplotlib
  • Features: Better default aesthetics and statistical plotting capabilities

scikit-learn

  • Purpose: Machine learning library
  • Components Used:
    • LinearRegression: Building the regression model
    • train_test_split: Splitting data into training and testing sets
    • mean_squared_error: Evaluating model performance
    • r2_score: Calculating R-squared metric

Installation

Using pip

pip install pandas ydata-profiling matplotlib seaborn scikit-learn

Using conda

conda install pandas matplotlib seaborn scikit-learn
pip install ydata-profiling
The ydata-profiling package is best installed via pip even in conda environments.

Data File Requirements

Dataset Location

The analysis expects the dataset at:
data/ecommerce_customers.csv

Dataset Structure

The CSV file should contain the following columns:
Column NameData TypeDescription
EmailStringCustomer’s email address
AddressStringCustomer’s physical address
AvatarStringCustomer’s avatar color
Avg. Session LengthFloatAverage session length in minutes
Time on AppFloatTime spent on mobile app in minutes
Time on WebsiteFloatTime spent on website in minutes
Length of MembershipFloatMembership duration in years
Yearly Amount SpentFloatAnnual spending by customer

Dataset Source

The dataset is available from Kaggle:

Verification

After installing the dependencies, verify the installation by running:
import pandas as pd
from ydata_profiling import ProfileReport
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import sklearn

print("All dependencies successfully imported!")
print(f"Pandas version: {pd.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")

Troubleshooting

If you encounter import errors, ensure you’re using the correct package name:
  • Old name: pandas-profiling
  • New name: ydata-profiling
Update your installation:
pip uninstall pandas-profiling
pip install ydata-profiling
If plots don’t display in Jupyter notebooks, add this magic command at the top of your notebook:
%matplotlib inline
If you experience compatibility issues between libraries, try creating a fresh virtual environment:
python -m venv ecommerce_env
source ecommerce_env/bin/activate  # On Windows: ecommerce_env\Scripts\activate
pip install pandas ydata-profiling matplotlib seaborn scikit-learn

Next Steps

Once all dependencies are installed and verified:
  1. Download the dataset from Kaggle
  2. Place it in the data/ directory
  3. Proceed to the Code Walkthrough to understand the analysis implementation

Build docs developers (and LLMs) love