Anomaly detection algorithms look at an unlabeled dataset of normal events and learn to detect unusual or anomalous events. This is one of the most commercially important applications of unsupervised learning.
Anomaly detection identifies data points that deviate significantly from normal patterns, enabling early detection of problems, fraud, defects, and other unusual events.
Consider a company manufacturing aircraft engines. Engine failures have severe consequences, so detecting potential problems before deployment is critical.
If x is a random variable with a Gaussian distribution:
x ~ N(μ, σ²)
Where:
μ (mu): Mean - the center of the distribution
σ² (sigma squared): Variance - controls the width
σ: Standard deviation
The Gaussian distribution is also called the “bell-shaped curve” because of its characteristic shape, reminiscent of bells found in old towers (like the Liberty Bell).
import numpy as npdef gaussian_probability(x, mu, sigma_squared): """ Compute Gaussian probability density Parameters: ----------- x : float or array Value(s) to compute probability for mu : float Mean of the distribution sigma_squared : float Variance of the distribution Returns: -------- p : float or array Probability density at x """ coefficient = 1 / np.sqrt(2 * np.pi * sigma_squared) exponent = np.exp(-(x - mu)**2 / (2 * sigma_squared)) return coefficient * exponent# Examplemu = 5sigma_squared = 4 # sigma = 2x = 7prob = gaussian_probability(x, mu, sigma_squared)print(f"p(x={x}) = {prob:.4f}")
For each feature j = 1 to n, estimate parameters from training data:
def estimate_gaussian(X): """ Estimate Gaussian parameters for each feature Parameters: ----------- X : array of shape (m, n) Training examples (m samples, n features) Returns: -------- mu : array of shape (n,) Mean of each feature sigma_squared : array of shape (n,) Variance of each feature """ m = X.shape[0] # Compute mean for each feature mu = np.mean(X, axis=0) # Compute variance for each feature sigma_squared = np.mean((X - mu)**2, axis=0) return mu, sigma_squared# Example usageX_train = np.array([ [5.1, 3.2], [4.9, 2.8], [5.2, 3.5], [4.8, 3.1], [5.0, 3.0]])mu, sigma_squared = estimate_gaussian(X_train)print(f"Mean (μ): {mu}")print(f"Variance (σ²): {sigma_squared}")
This assumes features are statistically independent. While this assumption may not be perfectly true, the algorithm often works well in practice even when features are correlated.
def compute_probability(x, mu, sigma_squared): """ Compute probability of x under the Gaussian model Parameters: ----------- x : array of shape (n,) Feature vector to evaluate mu : array of shape (n,) Mean parameters for each feature sigma_squared : array of shape (n,) Variance parameters for each feature Returns: -------- p : float Probability density (product of individual feature probabilities) """ n = len(x) p = 1.0 for j in range(n): p *= gaussian_probability(x[j], mu[j], sigma_squared[j]) return p# Test on a new examplex_test = np.array([5.0, 3.2])p_test = compute_probability(x_test, mu, sigma_squared)print(f"p(x_test) = {p_test:.6f}")
class AnomalyDetector: def __init__(self, epsilon=0.02): self.epsilon = epsilon self.mu = None self.sigma_squared = None def fit(self, X_train): """ Train the anomaly detector on normal data Parameters: ----------- X_train : array of shape (m, n) Training examples (assumed to be mostly normal) """ self.mu, self.sigma_squared = estimate_gaussian(X_train) return self def predict(self, X): """ Predict anomalies in data Parameters: ----------- X : array of shape (m, n) Examples to evaluate Returns: -------- anomalies : array of shape (m,) Boolean array: True for anomalies, False for normal """ m = X.shape[0] anomalies = np.zeros(m, dtype=bool) for i in range(m): p = compute_probability(X[i], self.mu, self.sigma_squared) anomalies[i] = (p < self.epsilon) return anomalies def get_probability(self, X): """ Get probability scores for examples Parameters: ----------- X : array of shape (m, n) Examples to evaluate Returns: -------- probabilities : array of shape (m,) Probability score for each example """ m = X.shape[0] probabilities = np.zeros(m) for i in range(m): probabilities[i] = compute_probability(X[i], self.mu, self.sigma_squared) return probabilities# Usage exampledetector = AnomalyDetector(epsilon=0.02)detector.fit(X_train)# Test on new dataX_test = np.array([ [5.0, 3.0], # Normal [8.5, 0.3], # Likely anomaly [4.9, 3.2] # Normal])anomalies = detector.predict(X_test)probs = detector.get_probability(X_test)for i, (is_anomaly, prob) in enumerate(zip(anomalies, probs)): status = "ANOMALY" if is_anomaly else "Normal" print(f"Example {i+1}: {status} (p={prob:.6f})")
Anomaly Detection: Very small number (0-20) of anomalous examples. Large number of normal examples.Supervised Learning: Enough positive and negative examples to learn patterns (typically 20+).
Nature of Positive Examples
Anomaly Detection: Many different ways for things to go wrong. Tomorrow’s anomalies may be unlike anything seen before (e.g., new types of engine failures).Supervised Learning: Future positive examples likely similar to training examples (e.g., spam emails tend to have recognizable patterns).
Feature selection is crucial: Choose features that might take unusually large/small values for anomalies
Multiple initializations: Train on data assumed to be normal (a few anomalies in training set is usually okay)
Tune epsilon carefully: Use cross-validation set with labeled anomalies
Use appropriate metrics: F1, precision, recall - not accuracy
Consider transformations: Sometimes applying log transforms or other transformations makes features more Gaussian
When Features Aren’t Independent:The algorithm assumes feature independence, but often works well even when this isn’t true. For handling correlations explicitly, consider multivariate Gaussian models.
Anomalies detection is a powerful tool in your unsupervised learning toolkit. Combined with clustering, you now have techniques to find structure and identify unusual patterns in unlabeled data.