Introduction to Unsupervised Learning

What is Unsupervised Learning

After supervised learning, the most widely used form of machine learning is unsupervised learning. Unlike supervised learning where each example is associated with an output label y, in unsupervised learning we’re given data that isn’t associated with any output labels.

In unsupervised learning, we’re not asked to predict specific outputs. Instead, our job is to find some structure, pattern, or something interesting in the data.

Supervised vs Unsupervised Learning

In supervised learning for classification problems, each example is labeled (e.g., benign or malignant tumors). In unsupervised learning, you might have the same data (patient tumor size and age), but without the labels indicating whether tumors are benign or malignant.

Supervised Learning

Data comes with both inputs x and output labels y

Unsupervised Learning

Data comes only with inputs x, algorithm finds structure automatically

Types of Unsupervised Learning

Unsupervised learning encompasses several powerful techniques for discovering patterns in data:

Clustering

Groups similar data points together into clusters. Used in market segmentation, document organization, and pattern discovery.

Anomaly Detection

Detects unusual events or outliers in data. Critical for fraud detection, quality control, and system monitoring.

Dimensionality Reduction

Compresses large datasets to smaller ones while preserving important information. Useful for visualization and computational efficiency.

Clustering Applications

Clustering is a type of unsupervised learning algorithm that places unlabeled data into different clusters. This technique has numerous real-world applications:

Google News Article Grouping

Google News uses clustering to group related stories together. Every day, it processes hundreds of thousands of news articles and automatically finds articles that mention similar words (like “panda”, “twin”, “zoo”) and groups them into clusters.The clustering algorithm figures out on its own which words suggest that certain articles belong in the same group - no human tells it what to look for. This is essential because news topics change daily and there are too many stories for manual categorization.

DNA Microarray Analysis

Researchers use clustering on genetic data to group individuals into different categories based on gene expression patterns. Each column in a DNA microarray represents one person’s genetic data, and each row represents a specific gene.By running a clustering algorithm, researchers can automatically discover distinct types of people based on their genetic profiles, without being told in advance what these types are.

Market Segmentation

Companies use clustering to automatically group customers into different market segments. For example, analyzing the deep learning community revealed distinct groups:

Knowledge seekers: Primary motivation is growing their skills
Career developers: Looking for promotions or new job opportunities
Industry followers: Want to stay updated on AI’s impact on their field

Understanding these segments helps companies better serve their customers.

Anomaly Detection Applications

Anomaly detection algorithms look at a dataset of normal events and learn to detect unusual or anomalous events. Key applications include:

Fraud Detection

Detecting unusual transactions in financial systems that could indicate fraudulent activity

Manufacturing Quality Control

Identifying defective products (like aircraft engines) that behave differently from normal ones

System Monitoring

Flagging unusual system behavior that might indicate failures or security breaches

Healthcare

Detecting abnormal patient data that requires further investigation

Key Characteristics

The fundamental difference between supervised and unsupervised learning:

Supervised learning: Algorithm is given “right answers” to learn from
Unsupervised learning: Algorithm discovers structure without being supervised or told what’s correct

Why Unsupervised Learning Matters

Unsupervised learning is particularly valuable when:

You have large amounts of unlabeled data
Manual labeling would be too expensive or time-consuming
You want to discover hidden patterns you didn’t know existed
The structure of the data may change over time
You need to process data automatically at scale

Real-World Impact

Unsupervised learning algorithms are among the most commercially important applications of machine learning. They enable:

Automatic organization of massive datasets
Discovery of insights not obvious to human observers
Scalable processing of data that would be impossible to manually categorize
Adaptive systems that work without constant human supervision

Next Steps

Clustering

Learn about K-means and other clustering algorithms

Anomaly Detection

Discover how to detect unusual events in your data

While this guide focuses on clustering and anomaly detection, unsupervised learning also includes other powerful techniques like dimensionality reduction (PCA), association rules, and generative models.

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

Introduction to Unsupervised Learning

What is Unsupervised Learning

Supervised vs Unsupervised Learning

Supervised Learning

Unsupervised Learning

Types of Unsupervised Learning

Clustering Applications

Anomaly Detection Applications

Fraud Detection

Manufacturing Quality Control

System Monitoring

Healthcare

Key Characteristics

Why Unsupervised Learning Matters

Real-World Impact

Next Steps

Clustering

Anomaly Detection

Build docs developers (and LLMs) love

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

​What is Unsupervised Learning

​Supervised vs Unsupervised Learning

Supervised Learning

Unsupervised Learning

​Types of Unsupervised Learning

​Clustering Applications

​Anomaly Detection Applications

Fraud Detection

Manufacturing Quality Control

System Monitoring

Healthcare

​Key Characteristics

​Why Unsupervised Learning Matters

​Real-World Impact

​Next Steps

Clustering

Anomaly Detection

Build docs developers (and LLMs) love

What is Unsupervised Learning

Supervised vs Unsupervised Learning

Types of Unsupervised Learning

Clustering Applications

Anomaly Detection Applications

Key Characteristics

Why Unsupervised Learning Matters

Real-World Impact

Next Steps