Linear Regression Model

Introduction

Linear regression is probably the most widely used learning algorithm in the world today. It fits a straight line to your data to make predictions. As you become familiar with linear regression, many concepts you see here will also apply to other machine learning models.

Housing Price Prediction Example

Let’s predict the price of a house based on its size using a dataset from Portland, a city in the United States.

In this example:

Horizontal axis: Size of house in square feet
Vertical axis: Price of house in thousands of dollars
Each data point (cross) represents a house with its size and sale price

Suppose you’re a real estate agent helping a client sell her house. She asks: “How much do you think I can get for this house?” You measure the house: 1,250 square feet. How much could it sell for?

Building a Linear Regression Model

One approach is to build a linear regression model from the dataset. Your model will fit a straight line to the data:

Measure the house size: 1,250 square feet
Find where this intersects the best fit line
Trace to the vertical axis to read the predicted price: approximately $220,000

This is a supervised learning model because you first train it by giving data with right answers—both the size of the house and the price for each example.

What Makes This a Regression Model?

Linear regression is a particular type of supervised learning model called a regression model because it predicts numbers as output, like prices in dollars.

Any supervised learning model that predicts a number such as 220,000 or 1.5 or -33.2 is addressing a regression problem.

Regression vs Classification

The key difference:

Regression: Infinitely many possible outputs (any number)
Classification: Small, finite set of categories (e.g., cat vs dog, malignant vs benign)

Understanding the Training Dataset

You can visualize the data in two ways:

1. Scatter Plot (Graph)

Plots house size vs price with each data point as a cross

2. Data Table

Size (sq ft)	Price ($1000s)
2,104	400
1,416	232
1,534	315
…	…

If you have 47 rows in the data table, there are 47 crosses on the plot, each corresponding to one row.

Machine Learning Notation

Input Feature (x)
Output Target (y)
Training Example (x, y)
Number of Examples (m)

x = input variable (also called feature or input feature)For the first training example: x = 2,104 (square feet)

Indexing Training Examples

To refer to a specific training example, use superscript notation:

x⁽ⁱ⁾, y⁽ⁱ⁾ = the i-th training example
x⁽¹⁾ = 2,104 (first example’s input)
y⁽¹⁾ = 400 (first example’s output)

The superscript (i) in parentheses is NOT exponentiation. x⁽²⁾ does not mean x squared—it refers to the second training example.

The Cost Function

To measure how well our model fits the data, we need a cost function.

Model Representation

Our linear model is:

f(x) = w * x + b

Where:

w and b are parameters (also called coefficients or weights)
w determines the slope
b is the y-intercept

How Different Parameters Affect the Line

Example 1: w=0, b=1.5

f(x) = 0 * x + 1.5

This creates a horizontal line at y = 1.5. The prediction is always 1.5 regardless of x.

Example 2: w=0.5, b=0

f(x) = 0.5 * x + 0

When x = 0, f(x) = 0
When x = 2, f(x) = 1This creates a line with slope 0.5 passing through the origin.

Example 3: w=0.5, b=1

f(x) = 0.5 * x + 1

The line has slope 0.5 and y-intercept at 1.

Defining the Cost Function

How do we find the best values for w and b? We measure the error between predictions and actual values. For training example i:

ŷ⁽ⁱ⁾ = prediction = f(x⁽ⁱ⁾) = w * x⁽ⁱ⁾ + b
y⁽ⁱ⁾ = actual target value
Error = ŷ⁽ⁱ⁾ - y⁽ⁱ⁾

The squared error cost function is:

J(w, b) = (1 / 2m) * Σ(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)²

Where:

J(w, b) = cost function
m = number of training examples
Σ = sum over all training examples from i=1 to m

The cost function is divided by 2m (not just m) to make later calculations neater. This division by 2 doesn’t change which parameters are optimal.

Why Squared Error?

The squared error cost function:

Penalizes larger errors more heavily
Is always positive (no negative errors)
Has nice mathematical properties for optimization
Is the most commonly used cost function for regression problems

Implementation Example

import numpy as np

# Training data
x_train = np.array([1.0, 2.0, 3.0])  # Size in 1000 sq ft
y_train = np.array([300, 500, 700])   # Price in $1000s

# Model parameters
w = 200
b = 100

# Make predictions
def predict(x, w, b):
    return w * x + b

# Calculate cost
def compute_cost(x, y, w, b):
    m = len(x)
    total_cost = 0
    
    for i in range(m):
        f_wb = w * x[i] + b
        cost = (f_wb - y[i]) ** 2
        total_cost += cost
    
    return total_cost / (2 * m)

# Example prediction
size = 1.25  # 1,250 sq ft
predicted_price = predict(size, w, b)
print(f"Predicted price: ${predicted_price * 1000}")

# Calculate cost for current parameters
cost = compute_cost(x_train, y_train, w, b)
print(f"Cost: {cost}")

Key Takeaways

Linear regression fits a straight line to data

The model f(x) = w * x + b predicts output y from input x

Parameters w and b define the line

w controls the slope, b controls the y-intercept

Cost function measures prediction error

J(w, b) quantifies how well the model fits the training data

Goal is to minimize the cost function

Find w and b that make J(w, b) as small as possible

What’s Next

Now that you understand the linear regression model and cost function, the next step is to learn how to systematically find the optimal values for w and b. This is where gradient descent comes in—a powerful algorithm for minimizing the cost function and finding the best fit line for your data.

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

Linear Regression Model

Introduction

Housing Price Prediction Example

Building a Linear Regression Model

What Makes This a Regression Model?

Regression vs Classification

Understanding the Training Dataset

1. Scatter Plot (Graph)

2. Data Table

Machine Learning Notation

Indexing Training Examples

The Cost Function

Model Representation

How Different Parameters Affect the Line

Defining the Cost Function

Why Squared Error?

Implementation Example

Key Takeaways

What’s Next

Build docs developers (and LLMs) love

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

​Introduction

​Housing Price Prediction Example

​Building a Linear Regression Model

​What Makes This a Regression Model?

​Regression vs Classification

​Understanding the Training Dataset

​1. Scatter Plot (Graph)

​2. Data Table

​Machine Learning Notation

​Indexing Training Examples

​The Cost Function

​Model Representation

​How Different Parameters Affect the Line

​Defining the Cost Function

​Why Squared Error?

​Implementation Example

​Key Takeaways

​What’s Next

Build docs developers (and LLMs) love

Introduction

Housing Price Prediction Example

Building a Linear Regression Model

What Makes This a Regression Model?

Regression vs Classification

Understanding the Training Dataset

1. Scatter Plot (Graph)

2. Data Table

Machine Learning Notation

Indexing Training Examples

The Cost Function

Model Representation

How Different Parameters Affect the Line

Defining the Cost Function

Why Squared Error?

Implementation Example

Key Takeaways

What’s Next