Skip to main content

Ecommerce Linear Regression Analysis

A data-driven approach to help clothing retailers make strategic decisions about mobile app vs. website development based on customer spending behavior.

Overview

This project applies linear regression analysis to real ecommerce customer data from a clothing store company. By analyzing customer behavior metrics, the model predicts yearly spending and reveals which platform (mobile app or website) drives more revenue.

98.85% Model Accuracy

Exceptional R² score demonstrates highly reliable predictions of customer spending patterns

Mobile App Impact

Time on app shows 138x stronger correlation with spending than website time

500 Customer Dataset

Comprehensive analysis of customer behavior including session length, platform usage, and membership duration

Data-Driven Strategy

Clear recommendations for resource allocation between mobile and web development

Key Insights

Our analysis of ecommerce customer data revealed several critical insights:

Model Performance

  • R² Score: 0.9885 - The model explains 98.85% of the variance in yearly spending
  • Mean Squared Error: 80.90 - Excellent prediction accuracy
  • Linear relationship - Strong linear correlation between features and target variable

Business Impact

The regression coefficients reveal the relative impact of each factor on customer spending:
FeatureCoefficientImpact
Length of Membership~61.30Strongest predictor - customer loyalty drives revenue
Time on App~38.81Second strongest - mobile engagement matters
Avg. Session Length~25.83Quality of engagement impacts spending
Time on Website~0.28Minimal impact compared to app
The mobile app coefficient (38.81) is approximately 138 times larger than the website coefficient (0.28), indicating that time spent on the mobile app has a significantly greater impact on yearly customer spending.

Strategic Recommendations

Based on the analysis:
  1. Prioritize Mobile App Development - Customer engagement with the mobile app has a dramatically larger impact on spending than the website
  2. Invest in Customer Retention - Length of membership shows the strongest correlation, highlighting the importance of loyalty programs
  3. Enhance Website Experience - While the current website impact is low, improving it could unlock additional revenue potential
  4. Focus on Engagement Quality - Average session length matters, suggesting that personalized experiences drive results

Dataset Overview

The analysis uses data from 500 customers with the following attributes:
  • Email & Address - Customer identification
  • Avg. Session Length - Duration of in-store style and clothing advice sessions (minutes)
  • Time on App - Time spent on the mobile application (minutes)
  • Time on Website - Time spent on the website (minutes)
  • Length of Membership - Customer membership duration (years)
  • Yearly Amount Spent - Annual spending (target variable)
Dataset source: Kaggle - Ecommerce Customers

Methodology

The project follows a systematic machine learning workflow:
1

Data Loading & Exploration

Load the customer dataset and perform exploratory data analysis to understand distributions and relationships
2

Data Preparation

Select relevant features (session length, app time, website time, membership length) and prepare for modeling
3

Train-Test Split

Divide data into 70% training and 30% testing sets to ensure unbiased evaluation
4

Model Training

Train a linear regression model using scikit-learn on the prepared features
5

Evaluation & Interpretation

Assess model performance and analyze coefficients to derive business insights

Next Steps

Ready to run the analysis yourself? Check out the Quickstart Guide for step-by-step instructions on setting up the environment and executing the analysis.

Quickstart

Get started with the analysis in minutes

View on GitHub

Explore the complete project repository

Author: Carolina Jiménez M | Portfolio

Build docs developers (and LLMs) love