Prediction Process
Once the model is trained, we use it to make predictions on the test set that was held out during training.Making Predictions
predict() method:
- Takes the test features (X_test) as input
- Applies the learned linear equation
- Returns predicted values for Yearly Amount Spent
- Produces 150 predictions (one for each test sample)
Example Predictions
Here’s a sample of the first 5 predictions:Evaluation Metrics
We use two primary metrics to assess model performance: Mean Squared Error (MSE) and R-squared (R²).Computing Metrics
Results
Based on the model evaluation:Understanding Mean Squared Error (MSE)
MSE: 80.90 Mean Squared Error measures the average squared difference between actual and predicted values.Formula
Interpretation
- Lower is better: MSE of 80.90 is relatively low
- Units: MSE is in squared dollars (dollars²)
- Root MSE: √80.90 ≈ 9 on average
- Context: Given that yearly spending ranges from 765, an RMSE of $9 represents excellent accuracy
An MSE of 80.90 indicates that the model’s predictions are very close to actual values, with an average error of approximately $9 per prediction.
Understanding R-squared (R²)
R²: 0.9885 R-squared measures the proportion of variance in the target variable explained by the model.Formula
- SS_res: Sum of squared residuals
- SS_tot: Total sum of squares
Interpretation
- Range: 0 to 1 (higher is better)
- Result: 0.9885 means the model explains 98.85% of the variance
- Excellent Fit: Values above 0.95 indicate a very strong model
- Implication: Only 1.15% of variance remains unexplained
An R² of 0.9885 is exceptionally high, indicating that the four features (Session Length, Time on App, Time on Website, and Membership Length) are excellent predictors of yearly customer spending.
Model Coefficients
The coefficients reveal how each feature impacts the prediction:Results
Coefficient Interpretation
| Feature | Coefficient | Impact |
|---|---|---|
| Avg. Session Length | 25.83 | +$25.83 per additional minute |
| Time on App | 38.81 | +$38.81 per additional minute |
| Time on Website | 0.28 | +$0.28 per additional minute |
| Length of Membership | 61.30 | +$61.30 per additional year |
Key Insights
- Length of Membership has the strongest impact ($61.30/year)
- Time on App is the second most important feature ($38.81/minute)
- Time on Website has minimal impact ($0.28/minute)
- Avg. Session Length has moderate impact ($25.83/minute)
The large difference between Time on App (0.28) suggests that the mobile app is significantly more effective at driving customer spending than the website.
Interpreting the Intercept
Intercept: -1048.82 The intercept represents the predicted yearly spending when all features equal zero.- Theoretical Value: Not practically meaningful (customers can’t have zero engagement)
- Mathematical Role: Adjusts the baseline of the prediction equation
- Don’t Overthink: Negative intercepts are common and don’t indicate a problem
How to Interpret Metrics
Is This a Good Model?
Yes! The model shows excellent performance: ✅ High R² (0.9885): Explains nearly 99% of variance ✅ Low MSE (80.90): Predictions are accurate within ~$9 ✅ Meaningful Coefficients: Features align with business intuition ✅ Interpretable: Clear insights for decision-makingBusiness Implications
Mobile App Priority
Time on App has 138x more impact than Time on Website (0.28), suggesting the company should prioritize mobile app improvements.
Customer Retention
Length of Membership has the highest coefficient ($61.30), indicating that customer loyalty programs and retention strategies are crucial.
Complete Evaluation Code
Here’s the full code for model evaluation:Summary
The linear regression model demonstrates excellent predictive performance:- Accuracy: RMSE of ~$9 on yearly spending predictions
- Explanatory Power: Explains 98.85% of variance in customer spending
- Actionable Insights: Clear guidance on resource allocation (prioritize app over website)
- Customer Focus: Strong emphasis on membership length highlights importance of retention