Metrics

Overview

The metrics module provides functions for evaluating anomaly detection performance at both the image level (classification) and pixel level (segmentation).

Functions

compute_imagewise_retrieval_metrics

def compute_imagewise_retrieval_metrics(
    anomaly_prediction_weights,
    anomaly_ground_truth_labels
)

Computes image-level anomaly detection metrics including AUROC, FPR, TPR, and optimal thresholds.

Parameters

anomaly_prediction_weights

np.array or list

required

Array of anomaly scores for each image with shape [N]. Higher values indicate higher probability of being anomalous.

anomaly_ground_truth_labels

np.array or list

required

Binary ground truth labels with shape [N]. Values are 1 for anomalous images and 0 for normal images.

Returns

auroc

float

Area Under the Receiver Operating Characteristic curve. Values range from 0 to 1, where 1 is perfect classification.

fpr

np.ndarray

False Positive Rate values at different thresholds.

tpr

np.ndarray

True Positive Rate values at different thresholds.

threshold

np.ndarray

Threshold values corresponding to each FPR/TPR pair.

Example

import numpy as np
from patchcore.metrics import compute_imagewise_retrieval_metrics

# Simulate predictions and ground truth
scores = np.array([0.1, 0.4, 0.9, 0.3, 0.8, 0.2])  # Predicted anomaly scores
labels = np.array([0, 0, 1, 0, 1, 0])  # Ground truth (0=normal, 1=anomaly)

metrics = compute_imagewise_retrieval_metrics(scores, labels)

print(f"Image-level AUROC: {metrics['auroc']:.4f}")
print(f"Number of threshold points: {len(metrics['threshold'])}")

Real-world Usage

from torch.utils.data import DataLoader
from patchcore.patchcore import PatchCore
from patchcore.metrics import compute_imagewise_retrieval_metrics

# Load model and test data
model = PatchCore(device)
model.load_from_path("./models/trained", device, nn_method)
test_loader = DataLoader(test_dataset, batch_size=32)

# Get predictions
scores, masks, labels_gt, masks_gt = model.predict(test_loader)

# Compute image-level metrics
image_metrics = compute_imagewise_retrieval_metrics(scores, labels_gt)

print(f"Image AUROC: {image_metrics['auroc']:.4f}")

AUROC Interpretation:

0.95-1.0: Excellent detection performance
0.90-0.95: Good detection performance
0.80-0.90: Fair detection performance
Below 0.80: Poor detection performance

compute_pixelwise_retrieval_metrics

def compute_pixelwise_retrieval_metrics(
    anomaly_segmentations,
    ground_truth_masks
)

Computes pixel-level anomaly segmentation metrics including AUROC, optimal threshold, and false positive/negative rates.

Parameters

anomaly_segmentations

list of np.ndarray or np.ndarray

required

Predicted anomaly segmentation masks. Can be:

List of arrays, each with shape [H, W]
Single array with shape [N, H, W]

Values should be continuous scores (not binary), typically in range [0, 1].

ground_truth_masks

list of np.ndarray or np.ndarray

required

Ground truth segmentation masks with the same shape as predictions. Binary values where 1 indicates anomalous pixels and 0 indicates normal pixels.

Returns

auroc

float

Pixel-level Area Under the ROC curve.

fpr

np.ndarray

False Positive Rate values at different thresholds.

tpr

np.ndarray

True Positive Rate values at different thresholds.

optimal_threshold

float

Threshold that maximizes the F1 score, balancing precision and recall.

optimal_fpr

float

False Positive Rate at the optimal threshold.

optimal_fnr

float

False Negative Rate at the optimal threshold.

Example

import numpy as np
from patchcore.metrics import compute_pixelwise_retrieval_metrics

# Simulate segmentation predictions and ground truth
predictions = [
    np.random.rand(224, 224),  # Continuous anomaly scores
    np.random.rand(224, 224),
    np.random.rand(224, 224)
]

ground_truth = [
    np.random.randint(0, 2, (224, 224)),  # Binary masks
    np.random.randint(0, 2, (224, 224)),
    np.random.randint(0, 2, (224, 224))
]

metrics = compute_pixelwise_retrieval_metrics(predictions, ground_truth)

print(f"Pixel-level AUROC: {metrics['auroc']:.4f}")
print(f"Optimal threshold: {metrics['optimal_threshold']:.4f}")
print(f"FPR at optimal threshold: {metrics['optimal_fpr']:.4f}")
print(f"FNR at optimal threshold: {metrics['optimal_fnr']:.4f}")

Real-world Usage

from torch.utils.data import DataLoader
from patchcore.patchcore import PatchCore
from patchcore.metrics import compute_pixelwise_retrieval_metrics

# Load model and test data
model = PatchCore(device)
model.load_from_path("./models/trained", device, nn_method)
test_loader = DataLoader(test_dataset, batch_size=32)

# Get predictions
scores, masks, labels_gt, masks_gt = model.predict(test_loader)

# Compute pixel-level metrics
pixel_metrics = compute_pixelwise_retrieval_metrics(masks, masks_gt)

print(f"Pixel AUROC: {pixel_metrics['auroc']:.4f}")
print(f"Optimal threshold: {pixel_metrics['optimal_threshold']:.4f}")

# Apply optimal threshold for binary segmentation
import numpy as np
threshold = pixel_metrics['optimal_threshold']
binary_masks = [(mask >= threshold).astype(int) for mask in masks]

Optimal Threshold: The returned optimal threshold maximizes the F1 score, which balances precision and recall. Use this threshold to convert continuous anomaly scores into binary segmentation masks.

Complete Evaluation Example

import torch
from torch.utils.data import DataLoader
from patchcore.patchcore import PatchCore
from patchcore.metrics import (
    compute_imagewise_retrieval_metrics,
    compute_pixelwise_retrieval_metrics
)
import patchcore.common

# Initialize model
device = torch.device("cuda")
model = PatchCore(device)

# Load trained model
nn_method = patchcore.common.FaissNN(False, 4)
model.load_from_path("./models/bottle", device, nn_method)

# Prepare test data
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Get predictions
print("Running inference...")
scores, masks, labels_gt, masks_gt = model.predict(test_loader)

# Compute image-level metrics
print("\nImage-level Metrics:")
image_metrics = compute_imagewise_retrieval_metrics(scores, labels_gt)
print(f"  AUROC: {image_metrics['auroc']:.4f}")

# Compute pixel-level metrics
print("\nPixel-level Metrics:")
pixel_metrics = compute_pixelwise_retrieval_metrics(masks, masks_gt)
print(f"  AUROC: {pixel_metrics['auroc']:.4f}")
print(f"  Optimal Threshold: {pixel_metrics['optimal_threshold']:.4f}")
print(f"  FPR at optimal: {pixel_metrics['optimal_fpr']:.4f}")
print(f"  FNR at optimal: {pixel_metrics['optimal_fnr']:.4f}")

# Plot ROC curves (optional)
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Image-level ROC
ax1.plot(image_metrics['fpr'], image_metrics['tpr'], linewidth=2)
ax1.plot([0, 1], [0, 1], 'k--', linewidth=1)
ax1.set_xlabel('False Positive Rate')
ax1.set_ylabel('True Positive Rate')
ax1.set_title(f'Image-level ROC (AUROC={image_metrics["auroc"]:.4f})')
ax1.grid(True, alpha=0.3)

# Pixel-level ROC
ax2.plot(pixel_metrics['fpr'], pixel_metrics['tpr'], linewidth=2)
ax2.plot([0, 1], [0, 1], 'k--', linewidth=1)
ax2.set_xlabel('False Positive Rate')
ax2.set_ylabel('True Positive Rate')
ax2.set_title(f'Pixel-level ROC (AUROC={pixel_metrics["auroc"]:.4f})')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('roc_curves.png', dpi=150)
plt.show()

Metric Definitions

AUROC (Area Under ROC Curve)

The area under the Receiver Operating Characteristic curve, which plots True Positive Rate vs False Positive Rate at various thresholds.

Range: 0 to 1
Perfect score: 1.0 (all anomalies ranked higher than normal samples)
Random classifier: 0.5

True Positive Rate (TPR)

Also called Recall or Sensitivity:

TPR = True Positives / (True Positives + False Negatives)

The proportion of actual anomalies correctly identified.

False Positive Rate (FPR)

FPR = False Positives / (False Positives + True Negatives)

The proportion of normal samples incorrectly classified as anomalies.

False Negative Rate (FNR)

FNR = False Negatives / (False Negatives + True Positives)

The proportion of anomalies that were missed (classified as normal).

F1 Score

Harmonic mean of precision and recall:

F1 = 2 * (Precision * Recall) / (Precision + Recall)

The optimal threshold is chosen to maximize this metric.

Dependencies

The metrics module uses:

numpy: For array operations
sklearn.metrics: For ROC and precision-recall curve computation

import numpy as np
from sklearn import metrics

Core Components

Utilities

CLI Tools

Overview

Functions

compute_imagewise_retrieval_metrics

Parameters

Returns

Example

Real-world Usage

compute_pixelwise_retrieval_metrics

Parameters

Returns

Example

Real-world Usage

Complete Evaluation Example

Metric Definitions

AUROC (Area Under ROC Curve)

True Positive Rate (TPR)

False Positive Rate (FPR)

False Negative Rate (FNR)

F1 Score

Dependencies

Build docs developers (and LLMs) love

Core Components

Utilities

CLI Tools

​Overview

​Functions

​compute_imagewise_retrieval_metrics

​Parameters

​Returns

​Example

​Real-world Usage

​compute_pixelwise_retrieval_metrics

​Parameters

​Returns

​Example

​Real-world Usage

​Complete Evaluation Example

​Metric Definitions

​AUROC (Area Under ROC Curve)

​True Positive Rate (TPR)

​False Positive Rate (FPR)

​False Negative Rate (FNR)

​F1 Score

​Dependencies

Build docs developers (and LLMs) love

Overview

Functions

compute_imagewise_retrieval_metrics

Parameters

Returns

Example

Real-world Usage

compute_pixelwise_retrieval_metrics

Parameters

Returns

Example

Real-world Usage

Complete Evaluation Example

Metric Definitions

AUROC (Area Under ROC Curve)

True Positive Rate (TPR)

False Positive Rate (FPR)

False Negative Rate (FNR)

F1 Score

Dependencies