Introduction
In the original version of linear regression, you had a single feature x (like house size) to predict y (house price). But what if you had multiple features? This would give you much more information to make accurate predictions. Multiple linear regression uses multiple input features to predict the output, making your model more powerful and flexible.Motivating Example: Housing Prices
Originally, you had:- Single feature: Size of house (x)
- Model: f(x) = w * x + b
| Size (sq ft) | Bedrooms | Floors | Age (years) | Price ($1000s) |
|---|---|---|---|---|
| 2104 | 5 | 1 | 45 | 460 |
| 1416 | 3 | 2 | 40 | 232 |
| 1534 | 3 | 2 | 30 | 315 |
| 852 | 2 | 1 | 36 | 178 |
With multiple features, you can capture more complex relationships. For example, the number of bedrooms, age of the house, and number of floors all influence the price.
Notation for Multiple Features
- Features
- Number of Features
- Feature Vector
- Specific Feature
x₁, x₂, x₃, x₄ = individual features
- x₁ = size in sq ft
- x₂ = number of bedrooms
- x₃ = number of floors
- x₄ = age in years
The superscript (i) refers to the training example number, NOT exponentiation. The subscript j refers to the feature number.
Model for Multiple Linear Regression
Expanded Form
With 4 features, the model becomes:Interpretation Example
Suppose the model learns these parameters:- Base price: $80,000 (the constant b = 80)
- Size: +1000)
- Bedrooms: +1000)
- Floors: +1000)
- Age: -1000, negative because older houses are cheaper)
General Form with n Features
For any number of features:Vector Notation
To make the notation more compact, we use vectors:Parameter Vector (w)
Parameter Vector (w)
w = [w₁, w₂, w₃, …, wₙ]A vector containing all the weights. The arrow notation (→) sometimes indicates it’s a vector.
Feature Vector (x)
Feature Vector (x)
x = [x₁, x₂, x₃, …, xₙ]A vector containing all the features for one training example.
Dot Product Representation
Using vectors, the model becomes:The dot product notation makes the model expression much more compact and easier to work with, especially when you have many features.
Vectorization: Making Code Fast
Vectorization is a technique that makes your code shorter and much faster.Without Vectorization (Slow)
With Loop (Better, but still slow)
With Vectorization (Best!) ⚡
Why Vectorization is Faster
Behind the scenes, NumPy uses:
- Parallel hardware (CPU or GPU)
- Optimized C/Fortran libraries
- SIMD instructions (Single Instruction, Multiple Data)
Implementing Multiple Linear Regression
Gradient Descent for Multiple Features
The gradient descent update rules extend naturally:Key Takeaways
Multiple features improve predictions
Using more relevant features generally leads to more accurate models
Vectorization accelerates computation
NumPy’s vectorized operations are much faster than explicit loops
Practical Considerations
Feature Scaling
Feature Scaling
When features have very different ranges (e.g., size: 100-5000, bedrooms: 1-5), gradient descent can be slow. Feature scaling normalizes features to similar ranges, speeding up convergence.
Learning Rate Selection
Learning Rate Selection
With multiple features, you may need a smaller learning rate than with one feature. If cost increases instead of decreases, try reducing α.
Feature Engineering
Feature Engineering
You can create new features by combining existing ones. For example, x₅ = x₁ * x₂ (size × bedrooms) might capture useful information.
What’s Next
Now that you understand multiple linear regression, explore:- Feature scaling techniques to speed up gradient descent
- Feature engineering to create more powerful features
- Polynomial regression for capturing non-linear relationships
- Regularization to prevent overfitting with many features
