Overview
Theload_and_evaluate_patchcore.py script loads pre-trained PatchCore models and evaluates them on test datasets. This is useful for:
- Testing saved models without retraining
- Evaluating on new data
- Benchmarking model performance
- Generating segmentation visualizations
run_patchcore.py, this script only performs inference and evaluation—no training occurs.
Command Structure
Main Command
Arguments
Path where evaluation results and metrics will be saved
Options
GPU device ID(s) to use for evaluation. Can specify multiple GPUs by repeating the flag.
Random seed for reproducibility
Save visualization images showing anomaly segmentation results
Unlike
run_patchcore.py, this script does not have --log_group, --log_project, or --save_patchcore_model options since it only loads existing models.Subcommand: patch_core_loader
Loads pre-trained PatchCore models from disk.Options
Short flag:
-pPath(s) to saved PatchCore model directories. Each path should point to a directory containing .faiss and .pkl files.The loader automatically detects:- Single models: Directory with one
.faissfile - Ensemble models: Directory with multiple
.faissfiles namedEnsemble-{i}-{n}_*.faiss
Use GPU-accelerated FAISS for nearest neighbor search. Significantly speeds up evaluation.
Number of CPU workers for FAISS operations
Subcommand: dataset
Configures dataset loading for evaluation.Arguments
Dataset type. Currently supported:
mvtecPath to the dataset root directory. Must exist.
Options
Short flag:
-dDataset categories to evaluate on. Should match the categories used during training.For MVTec AD:
bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipperBatch size for data loading during evaluation. Default is 1 for evaluation.
Number of worker processes for data loading
Image resize dimension (before center cropping). Must match training configuration.
Final image size after center cropping. Must match training configuration.
224- Standard ImageNet size320- Higher resolution models
Apply data augmentation during evaluation (typically not used)
Examples
Evaluate Single Model
Evaluate a single trained model on one category:Evaluate Multiple Categories
Evaluate models for all 15 MVTec AD categories (from sample script):Evaluate with Segmentation Visualization
Generate visualization images during evaluation:Evaluate IM224 Model
For models trained at 224x224 resolution:Evaluate IM320 Model
For models trained at 320x320 resolution:Model Directory Structure
Expected structure for model directories:Single Model
Ensemble Model
The script automatically detects ensemble models by counting
.faiss files in the directory.Output Files
The script creates the following structure:Metrics Computed
The script evaluates the following metrics:- Instance AUROC: Image-level anomaly detection accuracy
- Full Pixel AUROC: Pixel-level segmentation accuracy (all images)
- Anomaly Pixel AUROC: Pixel-level segmentation accuracy (anomalous images only)
results.csv with per-dataset and mean scores.
Comparison: run_patchcore.py vs load_and_evaluate_patchcore.py
| Feature | run_patchcore.py | load_and_evaluate_patchcore.py |
|---|---|---|
| Training | ✅ Yes | ❌ No |
| Evaluation | ✅ Yes | ✅ Yes |
| Save Models | ✅ Yes | ❌ No (loads only) |
| Subcommands | patch_core, sampler, dataset | patch_core_loader, dataset |
| Model Config | Configured via flags | Loaded from saved files |
| Use Case | Train new models | Evaluate existing models |
| Default Batch Size | 2 | 1 |
