Introduction
Logistic regression is probably the single most widely used classification algorithm in the world. Despite its name containing “regression,” it’s actually used for classification problems where the output is a category (0 or 1) rather than a continuous number.Logistic regression is used when the output variable y can take on only one of a small number of discrete values. For binary classification, y is either 0 or 1.
Why Not Linear Regression for Classification?
Linear regression is not suitable for classification problems. Here’s why:The Problem with Linear Regression
Suppose you’re classifying tumors as malignant (1) or benign (0) based on tumor size:Scenario 1: Linear regression seems to work
Scenario 1: Linear regression seems to work
With a small dataset, linear regression might fit a line that, with a threshold of 0.5, classifies correctly:
- Predictions < 0.5 → Class 0 (benign)
- Predictions ≥ 0.5 → Class 1 (malignant)
Scenario 2: One new example breaks everything
Scenario 2: One new example breaks everything
But add one large tumor example on the right, and the line shifts. Now the classification boundary moves, causing previously correct predictions to become wrong. Linear regression’s predictions can be any number, not just 0 or 1.
Linear regression can output values less than 0 or greater than 1, which doesn’t make sense for classification where we want probabilities between 0 and 1.
The Sigmoid Function
Logistic regression uses the sigmoid function (also called the logistic function) to squash predictions between 0 and 1.Mathematical Definition
The sigmoid function is:- e ≈ 2.718 (mathematical constant)
- z can be any real number (-∞ to +∞)
- g(z) is always between 0 and 1
Properties of the Sigmoid Function
- Large Positive z
- Large Negative z
- Zero
When z is very large (e.g., z = 100):The sigmoid approaches 1.
Visualizing the Sigmoid
The sigmoid creates an S-shaped curve:- Starts near 0 for large negative values
- Smoothly transitions through 0.5 at z = 0
- Approaches 1 for large positive values
The Logistic Regression Model
Logistic regression combines linear regression with the sigmoid function in two steps:Complete Model
Combining these steps:Interpreting the Output
Probability Interpretation
The output f(x) represents:Example: Tumor Classification
Suppose a patient has a tumor of certain size x, and the model outputs:- 70% chance the tumor is malignant (y = 1)
- 30% chance the tumor is benign (y = 0)
Probabilities must sum to 1. If P(y=1) = 0.7, then P(y=0) = 1 - 0.7 = 0.3.
Implementation
Python Implementation
With Multiple Features
Decision Boundary
The decision boundary is where the model switches between predicting class 0 and class 1.Threshold at 0.5
Common decision rule:- If f(x) ≥ 0.5, predict y = 1
- If f(x) < 0.5, predict y = 0
When is f(x) = 0.5?
Since sigmoid(0) = 0.5, we have f(x) = 0.5 when:Linear Decision Boundary
Linear Decision Boundary
With two features x₁ and x₂:This is a straight line separating the two classes.
Non-linear Decision Boundary
Non-linear Decision Boundary
With polynomial features like x₁², x₂²:This creates a circular or elliptical boundary.
Cost Function for Logistic Regression
The squared error cost function doesn’t work well for logistic regression (creates non-convex function with many local minima). Instead, we use the logistic loss or binary cross-entropy:- Is convex (one global minimum)
- Heavily penalizes confident wrong predictions
- Works well with gradient descent
Training with Gradient Descent
Gradient descent for logistic regression:Key Takeaways
Classification, Not Regression
Despite its name, logistic regression is used for classification problems, predicting discrete categories rather than continuous values.
Outputs Probabilities
The sigmoid function ensures outputs are between 0 and 1, interpretable as probabilities that y = 1.
Decision Boundary
The decision boundary (where z = 0) separates regions where the model predicts different classes.
Widely Used
Logistic regression is one of the most commonly used algorithms in practice, powering applications from medical diagnosis to ad targeting.
Real-World Applications
Medical Diagnosis
Medical Diagnosis
Classifying whether a patient has a disease based on symptoms, test results, and medical history.
Email Spam Detection
Email Spam Detection
Determining if an email is spam based on content, sender, subject line, and other features.
Credit Scoring
Credit Scoring
Predicting whether a loan applicant will default based on income, credit history, and other factors.
Customer Churn
Customer Churn
Identifying which customers are likely to stop using a service based on usage patterns and demographics.
What’s Next
Now that you understand logistic regression, explore:- Regularization to prevent overfitting in classification
- Multi-class classification for problems with more than 2 categories
- Advanced optimization algorithms beyond gradient descent
- Performance metrics like precision, recall, and F1-score
