Training Overview
The model training process involves preparing the data, splitting it into training and test sets, initializing the linear regression model, and fitting it to the training data.Data Preparation
Before training, we prepare our features and target variable from the dataset.Feature Selection
Four numerical features are selected for the model:Splitting Features and Target
Train-Test Split
The data is split into training and testing sets using a 70-30 ratio. This allows us to train the model on 70% of the data and evaluate its performance on the remaining 30%.Split Parameters
- test_size=0.3: Allocates 30% of data for testing (150 samples)
- random_state=1: Ensures reproducible splits across runs
- Result: 350 training samples, 150 testing samples (from 500 total)
Setting
random_state ensures that you get the same train-test split every time you run the code, making results reproducible.Model Training Workflow
Initialize the Model
Create an instance of the LinearRegression classThis creates an untrained linear regression model with default parameters.
Fit the Model
Train the model using the training dataDuring this step, the model:
- Calculates optimal coefficients using Ordinary Least Squares
- Minimizes the sum of squared residuals
- Learns the relationship between features and target
Training Data Specifications
Dataset Statistics
- Total samples: 500 customers
- Training samples: 350 customers (70%)
- Testing samples: 150 customers (30%)
- Features: 4 numerical variables
- Target: 1 continuous variable (Yearly Amount Spent)
Feature Ranges
Based on the complete dataset:| Feature | Min | Max | Mean |
|---|---|---|---|
| Avg. Session Length | 29.53 min | 36.14 min | 33.05 min |
| Time on App | 8.51 min | 15.13 min | 12.05 min |
| Time on Website | 33.91 min | 40.01 min | 37.06 min |
| Length of Membership | 0.27 years | 6.92 years | 3.53 years |
| Yearly Amount Spent | $256.67 | $765.52 | $499.31 |
The fit() Method
Thefit() method is where the actual training happens:
What Happens Inside?
- Matrix Operations: Converts data to matrix form
- Normal Equation: Solves β = (X^T X)^(-1) X^T y
- Coefficient Calculation: Computes optimal coefficients
- Model Storage: Stores learned parameters in the model object
The
fit() method modifies the model object in-place, storing the learned coefficients and intercept that will be used for future predictions.Training Validation
After training, you can verify the model has learned parameters:Next Steps
Once the model is trained, you can:- Make predictions on new data
- Evaluate model performance
- Examine coefficients to understand feature importance
- Use the model for business insights