What is Regression?
Regression is a supervised learning task where the goal is to predict a continuous numerical output based on input features. Real-world applications:- Predicting house prices based on size, location, and features
- Forecasting sales revenue for business planning
- Estimating customer lifetime value
- Predicting delivery times for logistics optimization
Module A6 Project: E-Commerce Sales Prediction
In this module, you’ll build regression models to predicttotal_sales for e-commerce orders using the Amazon sales dataset.
Dataset: 10,000 orders with 23 features including:
- Customer attributes (country, state, city)
- Product information (category, subcategory, brand)
- Order details (quantity, unit price, discount, shipping cost)
- Logistics data (order date, ship date, delivery date)
total_sales - the total amount of the order
Linear Regression
The simplest and most interpretable regression model.How it works
Linear regression finds the best-fit line through your data:yis the predicted valueβ₀is the interceptβ₁, β₂, ..., βₙare the coefficients (weights)x₁, x₂, ..., xₙare the features
Implementation
Linear Regression with scikit-learn
Linear regression assumes:
- Linear relationship between features and target
- Features are independent (no multicollinearity)
- Residuals are normally distributed
- Constant variance of residuals (homoscedasticity)
Polynomial Regression
Capture non-linear relationships by creating polynomial features.Creating polynomial features
Polynomial Features
- Original:
[x₁, x₂] - Polynomial:
[1, x₁, x₂, x₁², x₁x₂, x₂²]
Regularized Regression
Prevent overfitting by penalizing large coefficients.Ridge Regression (L2 Regularization)
Adds penalty proportional to square of coefficients.Ridge Regression
- Many correlated features
- Want to keep all features but reduce their impact
- Prevent overfitting
Lasso Regression (L1 Regularization)
Adds penalty proportional to absolute value of coefficients. Can shrink coefficients to zero (feature selection).Lasso Regression
- Want automatic feature selection
- Have many irrelevant features
- Need a sparse model
Ensemble Methods
Combine multiple models for better predictions.Gradient Boosting Regressor
Sequentially builds trees, each correcting errors of previous trees.Gradient Boosting
- Often best performance
- Handles non-linear relationships
- Provides feature importance
- Robust to outliers
- Slower to train
- More hyperparameters to tune
- Can overfit without proper tuning
Module A6 Project Results
From the e-commerce sales prediction project:| Model | MAE ($) | RMSE ($) | R² |
|---|---|---|---|
| Linear Regression (baseline) | 45.20 | 58.30 | 0.7450 |
| Linear Regression (full) | 38.15 | 49.80 | 0.8120 |
| Polynomial (degree=2) | 35.60 | 46.20 | 0.8340 |
| Ridge (optimized) | 34.80 | 45.10 | 0.8420 |
| Gradient Boosting | 32.15 | 41.50 | 0.8823 |
The Gradient Boosting model achieved the best performance, with an average prediction error of $32.15 and explaining 88.23% of variance in sales.
Feature Importance
Top features influencing sales predictions:- Unit price (importance: 0.45) - Strongest predictor
- Quantity (importance: 0.28) - Number of items ordered
- Shipping cost (importance: 0.12) - Logistics impact
- Discount (importance: 0.08) - Promotion effect
- Product category (importance: 0.07) - Category variations
Complete Pipeline with Preprocessing
Full ML Pipeline
Best Practices
More complex doesn’t always mean better. Sometimes a well-tuned Ridge regression performs nearly as well as Gradient Boosting with much faster training and prediction.
Business Value
Predicting sales enables:- Personalized marketing: Target high-value customers with custom offers
- Inventory management: Stock popular items, reduce slow movers
- Revenue forecasting: Accurate quarterly and annual projections
- Dynamic pricing: Adjust prices based on predicted demand
- Customer segmentation: Identify high-value vs. low-value customers
Next Steps
Classification models
Learn to predict categories instead of numbers
Model evaluation
Deep dive into metrics and model comparison
Project walkthrough
Complete implementation guide for the e-commerce project