Skip to main content

What is Anomaly Detection

Anomaly detection algorithms look at an unlabeled dataset of normal events and learn to detect unusual or anomalous events. This is one of the most commercially important applications of unsupervised learning.
Anomaly detection identifies data points that deviate significantly from normal patterns, enabling early detection of problems, fraud, defects, and other unusual events.

Real-World Example: Aircraft Engine Monitoring

Consider a company manufacturing aircraft engines. Engine failures have severe consequences, so detecting potential problems before deployment is critical.

The Scenario

After an aircraft engine rolls off the assembly line, you compute features:
  • x₁: Heat generated by the engine
  • x₂: Vibration intensity
  • Additional features as needed
1

Collect Normal Data

Gather data from m manufactured engines. Most engines are normal (non-defective), so this becomes your training set of typical behavior.
2

Build Probability Model

Create a model p(x) that estimates the probability of seeing any given feature vector in normal engines.
3

Test New Engines

When a new engine is manufactured with features x_test:
  • If p(x_test) is high → engine looks normal
  • If p(x_test) is very low → flag as potential anomaly for inspection

The Gaussian Distribution

Anomaly detection relies on the Gaussian (normal) distribution to model feature probabilities.

Understanding the Gaussian

If x is a random variable with a Gaussian distribution:
x ~ N(μ, σ²)
Where:
  • μ (mu): Mean - the center of the distribution
  • σ² (sigma squared): Variance - controls the width
  • σ: Standard deviation
The Gaussian distribution is also called the “bell-shaped curve” because of its characteristic shape, reminiscent of bells found in old towers (like the Liberty Bell).

Gaussian Probability Formula

The probability density function is:
p(x) = (1 / √(2πσ²)) * exp(-(x - μ)² / (2σ²))
Implementing this in Python:
import numpy as np

def gaussian_probability(x, mu, sigma_squared):
    """
    Compute Gaussian probability density
    
    Parameters:
    -----------
    x : float or array
        Value(s) to compute probability for
    mu : float
        Mean of the distribution
    sigma_squared : float
        Variance of the distribution
    
    Returns:
    --------
    p : float or array
        Probability density at x
    """
    coefficient = 1 / np.sqrt(2 * np.pi * sigma_squared)
    exponent = np.exp(-(x - mu)**2 / (2 * sigma_squared))
    return coefficient * exponent

# Example
mu = 5
sigma_squared = 4  # sigma = 2
x = 7
prob = gaussian_probability(x, mu, sigma_squared)
print(f"p(x={x}) = {prob:.4f}")

Anomaly Detection Algorithm

Here’s the complete algorithm using Gaussian distributions for density estimation.

Step 1: Choose Features

Select features xⱼ that might indicate anomalies. For example:
  • CPU load and memory usage for servers
  • Transaction amount and location for fraud detection
  • Heat and vibration for engines

Step 2: Fit Parameters

For each feature j = 1 to n, estimate parameters from training data:
def estimate_gaussian(X):
    """
    Estimate Gaussian parameters for each feature
    
    Parameters:
    -----------
    X : array of shape (m, n)
        Training examples (m samples, n features)
    
    Returns:
    --------
    mu : array of shape (n,)
        Mean of each feature
    sigma_squared : array of shape (n,)
        Variance of each feature
    """
    m = X.shape[0]
    
    # Compute mean for each feature
    mu = np.mean(X, axis=0)
    
    # Compute variance for each feature
    sigma_squared = np.mean((X - mu)**2, axis=0)
    
    return mu, sigma_squared

# Example usage
X_train = np.array([
    [5.1, 3.2],
    [4.9, 2.8],
    [5.2, 3.5],
    [4.8, 3.1],
    [5.0, 3.0]
])

mu, sigma_squared = estimate_gaussian(X_train)
print(f"Mean (μ): {mu}")
print(f"Variance (σ²): {sigma_squared}")
Mathematical Formulas:μⱼ = (1/m) Σᵢ₌₁ᵐ xⱼⁱσⱼ² = (1/m) Σᵢ₌₁ᵐ (xⱼⁱ - μⱼ)²

Step 3: Compute Probability for New Examples

For a new example x, compute:
p(x) = p(x₁; μ₁, σ₁²) × p(x₂; μ₂, σ₂²) × ... × p(xₙ; μₙ, σₙ²)
     = ∏ⱼ₌₁ⁿ p(xⱼ; μⱼ, σⱼ²)
This assumes features are statistically independent. While this assumption may not be perfectly true, the algorithm often works well in practice even when features are correlated.
def compute_probability(x, mu, sigma_squared):
    """
    Compute probability of x under the Gaussian model
    
    Parameters:
    -----------
    x : array of shape (n,)
        Feature vector to evaluate
    mu : array of shape (n,)
        Mean parameters for each feature
    sigma_squared : array of shape (n,)
        Variance parameters for each feature
    
    Returns:
    --------
    p : float
        Probability density (product of individual feature probabilities)
    """
    n = len(x)
    p = 1.0
    
    for j in range(n):
        p *= gaussian_probability(x[j], mu[j], sigma_squared[j])
    
    return p

# Test on a new example
x_test = np.array([5.0, 3.2])
p_test = compute_probability(x_test, mu, sigma_squared)
print(f"p(x_test) = {p_test:.6f}")

Step 4: Flag Anomalies

Compare p(x) to a threshold ε (epsilon):
def detect_anomaly(x, mu, sigma_squared, epsilon):
    """
    Detect if x is an anomaly
    
    Parameters:
    -----------
    x : array
        Feature vector
    mu : array
        Mean parameters
    sigma_squared : array
        Variance parameters
    epsilon : float
        Anomaly threshold
    
    Returns:
    --------
    is_anomaly : bool
        True if x is flagged as anomaly
    """
    p = compute_probability(x, mu, sigma_squared)
    return p < epsilon

# Example
epsilon = 0.02
x_normal = np.array([5.0, 3.1])
x_anomalous = np.array([8.0, 0.5])

print(f"Normal example: {detect_anomaly(x_normal, mu, sigma_squared, epsilon)}")
print(f"Anomalous example: {detect_anomaly(x_anomalous, mu, sigma_squared, epsilon)}")
The algorithm flags examples as anomalous when one or more features are unusually large or small relative to the training distribution.

Complete Implementation

Here’s a full anomaly detection system:
class AnomalyDetector:
    def __init__(self, epsilon=0.02):
        self.epsilon = epsilon
        self.mu = None
        self.sigma_squared = None
    
    def fit(self, X_train):
        """
        Train the anomaly detector on normal data
        
        Parameters:
        -----------
        X_train : array of shape (m, n)
            Training examples (assumed to be mostly normal)
        """
        self.mu, self.sigma_squared = estimate_gaussian(X_train)
        return self
    
    def predict(self, X):
        """
        Predict anomalies in data
        
        Parameters:
        -----------
        X : array of shape (m, n)
            Examples to evaluate
        
        Returns:
        --------
        anomalies : array of shape (m,)
            Boolean array: True for anomalies, False for normal
        """
        m = X.shape[0]
        anomalies = np.zeros(m, dtype=bool)
        
        for i in range(m):
            p = compute_probability(X[i], self.mu, self.sigma_squared)
            anomalies[i] = (p < self.epsilon)
        
        return anomalies
    
    def get_probability(self, X):
        """
        Get probability scores for examples
        
        Parameters:
        -----------
        X : array of shape (m, n)
            Examples to evaluate
        
        Returns:
        --------
        probabilities : array of shape (m,)
            Probability score for each example
        """
        m = X.shape[0]
        probabilities = np.zeros(m)
        
        for i in range(m):
            probabilities[i] = compute_probability(X[i], self.mu, self.sigma_squared)
        
        return probabilities

# Usage example
detector = AnomalyDetector(epsilon=0.02)
detector.fit(X_train)

# Test on new data
X_test = np.array([
    [5.0, 3.0],   # Normal
    [8.5, 0.3],   # Likely anomaly
    [4.9, 3.2]    # Normal
])

anomalies = detector.predict(X_test)
probs = detector.get_probability(X_test)

for i, (is_anomaly, prob) in enumerate(zip(anomalies, probs)):
    status = "ANOMALY" if is_anomaly else "Normal"
    print(f"Example {i+1}: {status} (p={prob:.6f})")

Developing and Evaluating the System

While anomaly detection uses unlabeled data, having some labeled examples helps tune parameters and evaluate performance.

Dataset Split

1

Training Set

Large set of unlabeled examples (x¹, x², …, xᵐ). Assume these are mostly normal (y=0).
2

Cross-Validation Set

Small set with labels including some anomalies. Use to tune ε and choose features.
3

Test Set

Separate labeled set with anomalies to evaluate final performance.

Evaluation Metrics

Because anomalies are rare (skewed classes), accuracy is misleading. Use:
from sklearn.metrics import precision_score, recall_score, f1_score

# Assuming y_true has labels: 0=normal, 1=anomaly
y_true = np.array([0, 1, 0, 1, 0])
y_pred = detector.predict(X_test)

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1 Score: {f1:.3f}")
Of examples flagged as anomalies, what fraction are truly anomalous?High precision = Few false alarms
Of all true anomalies, what fraction did we detect?High recall = Catching most anomalies
Harmonic mean of precision and recall. Good single metric when you want to balance both.F1 = 2 × (precision × recall) / (precision + recall)

Choosing Epsilon

Tune ε on the cross-validation set:
def select_epsilon(X_val, y_val, mu, sigma_squared):
    """
    Select optimal epsilon threshold
    
    Parameters:
    -----------
    X_val : array
        Cross-validation examples
    y_val : array
        True labels (0=normal, 1=anomaly)
    mu, sigma_squared : arrays
        Gaussian parameters from training
    
    Returns:
    --------
    best_epsilon : float
        Epsilon with highest F1 score
    best_f1 : float
        Best F1 score achieved
    """
    # Get probabilities for all validation examples
    probabilities = np.array([compute_probability(x, mu, sigma_squared) 
                             for x in X_val])
    
    # Try different epsilon values
    best_epsilon = 0
    best_f1 = 0
    
    epsilons = np.linspace(probabilities.min(), probabilities.max(), 1000)
    
    for epsilon in epsilons:
        predictions = (probabilities < epsilon).astype(int)
        
        # Compute F1 score
        f1 = f1_score(y_val, predictions)
        
        if f1 > best_f1:
            best_f1 = f1
            best_epsilon = epsilon
    
    return best_epsilon, best_f1

# Example usage
epsilon_best, f1_best = select_epsilon(X_val, y_val, mu, sigma_squared)
print(f"Best epsilon: {epsilon_best:.6f}")
print(f"Best F1 score: {f1_best:.3f}")

Anomaly Detection vs Supervised Learning

When should you use anomaly detection versus supervised learning?

Use Anomaly Detection

  • Very few positive examples (0-20)
  • Many types of anomalies
  • Future anomalies may look different
  • Examples: fraud, manufacturing defects, monitoring

Use Supervised Learning

  • Sufficient positive examples
  • Consistent types of positive examples
  • Future positives similar to training
  • Examples: spam, disease diagnosis

Key Decision Factors

Anomaly Detection: Very small number (0-20) of anomalous examples. Large number of normal examples.Supervised Learning: Enough positive and negative examples to learn patterns (typically 20+).
Anomaly Detection: Many different ways for things to go wrong. Tomorrow’s anomalies may be unlike anything seen before (e.g., new types of engine failures).Supervised Learning: Future positive examples likely similar to training examples (e.g., spam emails tend to have recognizable patterns).

Practical Applications

Fraud Detection

Monitor transactions for unusual patterns indicating fraudulent activity

Manufacturing Quality

Detect defective products based on sensor readings and measurements

System Monitoring

Identify unusual server behavior indicating failures or attacks

Cybersecurity

Detect intrusions and unusual network activity

Key Takeaways

Best Practices:
  1. Feature selection is crucial: Choose features that might take unusually large/small values for anomalies
  2. Multiple initializations: Train on data assumed to be normal (a few anomalies in training set is usually okay)
  3. Tune epsilon carefully: Use cross-validation set with labeled anomalies
  4. Use appropriate metrics: F1, precision, recall - not accuracy
  5. Consider transformations: Sometimes applying log transforms or other transformations makes features more Gaussian
When Features Aren’t Independent:The algorithm assumes feature independence, but often works well even when this isn’t true. For handling correlations explicitly, consider multivariate Gaussian models.

Next Steps

Anomalies detection is a powerful tool in your unsupervised learning toolkit. Combined with clustering, you now have techniques to find structure and identify unusual patterns in unlabeled data.

Back to Overview

Review all unsupervised learning concepts

Build docs developers (and LLMs) love