Introduction
Linear regression is probably the most widely used learning algorithm in the world today. It fits a straight line to your data to make predictions. As you become familiar with linear regression, many concepts you see here will also apply to other machine learning models.Housing Price Prediction Example
Let’s predict the price of a house based on its size using a dataset from Portland, a city in the United States.In this example:
- Horizontal axis: Size of house in square feet
- Vertical axis: Price of house in thousands of dollars
- Each data point (cross) represents a house with its size and sale price
Building a Linear Regression Model
One approach is to build a linear regression model from the dataset. Your model will fit a straight line to the data:- Measure the house size: 1,250 square feet
- Find where this intersects the best fit line
- Trace to the vertical axis to read the predicted price: approximately $220,000
This is a supervised learning model because you first train it by giving data with right answers—both the size of the house and the price for each example.
What Makes This a Regression Model?
Linear regression is a particular type of supervised learning model called a regression model because it predicts numbers as output, like prices in dollars.Regression vs Classification
The key difference:- Regression: Infinitely many possible outputs (any number)
- Classification: Small, finite set of categories (e.g., cat vs dog, malignant vs benign)
Understanding the Training Dataset
You can visualize the data in two ways:1. Scatter Plot (Graph)
Plots house size vs price with each data point as a cross2. Data Table
| Size (sq ft) | Price ($1000s) |
|---|---|
| 2,104 | 400 |
| 1,416 | 232 |
| 1,534 | 315 |
| … | … |
Machine Learning Notation
- Input Feature (x)
- Output Target (y)
- Training Example (x, y)
- Number of Examples (m)
x = input variable (also called feature or input feature)For the first training example: x = 2,104 (square feet)
Indexing Training Examples
To refer to a specific training example, use superscript notation:- x⁽ⁱ⁾, y⁽ⁱ⁾ = the i-th training example
- x⁽¹⁾ = 2,104 (first example’s input)
- y⁽¹⁾ = 400 (first example’s output)
The superscript (i) in parentheses is NOT exponentiation. x⁽²⁾ does not mean x squared—it refers to the second training example.
The Cost Function
To measure how well our model fits the data, we need a cost function.Model Representation
Our linear model is:- w and b are parameters (also called coefficients or weights)
- w determines the slope
- b is the y-intercept
How Different Parameters Affect the Line
Example 1: w=0, b=1.5
Example 1: w=0, b=1.5
Example 2: w=0.5, b=0
Example 2: w=0.5, b=0
When x = 2, f(x) = 1This creates a line with slope 0.5 passing through the origin.
Example 3: w=0.5, b=1
Example 3: w=0.5, b=1
Defining the Cost Function
How do we find the best values for w and b? We measure the error between predictions and actual values. For training example i:- ŷ⁽ⁱ⁾ = prediction = f(x⁽ⁱ⁾) = w * x⁽ⁱ⁾ + b
- y⁽ⁱ⁾ = actual target value
- Error = ŷ⁽ⁱ⁾ - y⁽ⁱ⁾
- J(w, b) = cost function
- m = number of training examples
- Σ = sum over all training examples from i=1 to m
The cost function is divided by 2m (not just m) to make later calculations neater. This division by 2 doesn’t change which parameters are optimal.
Why Squared Error?
The squared error cost function:- Penalizes larger errors more heavily
- Is always positive (no negative errors)
- Has nice mathematical properties for optimization
- Is the most commonly used cost function for regression problems
Implementation Example
Key Takeaways
Linear regression fits a straight line to data
The model f(x) = w * x + b predicts output y from input x
Cost function measures prediction error
J(w, b) quantifies how well the model fits the training data
