Skip to main content

Overview

The load_and_evaluate_patchcore.py script loads pre-trained PatchCore models and evaluates them on test datasets. This is useful for:
  • Testing saved models without retraining
  • Evaluating on new data
  • Benchmarking model performance
  • Generating segmentation visualizations
Unlike run_patchcore.py, this script only performs inference and evaluation—no training occurs.

Command Structure

python bin/load_and_evaluate_patchcore.py [MAIN_OPTIONS] RESULTS_PATH \
  patch_core_loader [LOADER_OPTIONS] \
  dataset [DATASET_OPTIONS] DATASET_NAME DATA_PATH

Main Command

Arguments

results_path
string
required
Path where evaluation results and metrics will be saved

Options

--gpu
int
default:"0"
GPU device ID(s) to use for evaluation. Can specify multiple GPUs by repeating the flag.
--gpu 0  # Use GPU 0
--seed
int
default:"0"
Random seed for reproducibility
--save_segmentation_images
flag
Save visualization images showing anomaly segmentation results
Unlike run_patchcore.py, this script does not have --log_group, --log_project, or --save_patchcore_model options since it only loads existing models.

Subcommand: patch_core_loader

Loads pre-trained PatchCore models from disk.

Options

--patch_core_paths
string
required
Short flag: -pPath(s) to saved PatchCore model directories. Each path should point to a directory containing .faiss and .pkl files.The loader automatically detects:
  • Single models: Directory with one .faiss file
  • Ensemble models: Directory with multiple .faiss files named Ensemble-{i}-{n}_*.faiss
# Single model
-p /path/to/models/mvtec_bottle

# Multiple categories (evaluated separately)
-p /path/to/models/mvtec_bottle \
-p /path/to/models/mvtec_cable \
-p /path/to/models/mvtec_capsule
--faiss_on_gpu
flag
Use GPU-accelerated FAISS for nearest neighbor search. Significantly speeds up evaluation.
--faiss_num_workers
int
default:"8"
Number of CPU workers for FAISS operations

Subcommand: dataset

Configures dataset loading for evaluation.

Arguments

name
string
required
Dataset type. Currently supported: mvtec
data_path
path
required
Path to the dataset root directory. Must exist.

Options

--subdatasets
string
required
Short flag: -dDataset categories to evaluate on. Should match the categories used during training.For MVTec AD: bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipper
-d bottle -d cable -d capsule
--batch_size
int
default:"1"
Batch size for data loading during evaluation. Default is 1 for evaluation.
--num_workers
int
default:"8"
Number of worker processes for data loading
--resize
int
default:"256"
Image resize dimension (before center cropping). Must match training configuration.
--imagesize
int
default:"224"
Final image size after center cropping. Must match training configuration.
  • 224 - Standard ImageNet size
  • 320 - Higher resolution models
--augment
flag
Apply data augmentation during evaluation (typically not used)
The --resize and --imagesize parameters must match the values used during training, otherwise evaluation results will be incorrect.

Examples

Evaluate Single Model

Evaluate a single trained model on one category:
python bin/load_and_evaluate_patchcore.py \
  --gpu 0 \
  --seed 0 \
  results/evaluation \
  patch_core_loader \
    -p /path/to/models/mvtec_bottle \
    --faiss_on_gpu \
  dataset \
    --resize 256 \
    --imagesize 224 \
    -d bottle \
    mvtec /path/to/mvtec

Evaluate Multiple Categories

Evaluate models for all 15 MVTec AD categories (from sample script):
# Define paths
DATAPATH=/path/to/mvtec
LOADPATH=/path/to/pretrained/models
MODELFOLDER=IM320_Ensemble_L2-3_P001_D1024-384_PS-3_AN-1

# Categories
DATASETS=(bottle cable capsule carpet grid hazelnut leather \
          metal_nut pill screw tile toothbrush transistor wood zipper)

# Build model paths
MODEL_FLAGS=()
for dataset in "${DATASETS[@]}"; do
  MODEL_FLAGS+=(-p "$LOADPATH/$MODELFOLDER/models/mvtec_$dataset")
done

# Build dataset flags
DATASET_FLAGS=()
for dataset in "${DATASETS[@]}"; do
  DATASET_FLAGS+=(-d "$dataset")
done

# Run evaluation
python bin/load_and_evaluate_patchcore.py \
  --gpu 0 \
  --seed 0 \
  results/evaluated_results/$MODELFOLDER \
  patch_core_loader "${MODEL_FLAGS[@]}" --faiss_on_gpu \
  dataset --resize 366 --imagesize 320 "${DATASET_FLAGS[@]}" mvtec $DATAPATH

Evaluate with Segmentation Visualization

Generate visualization images during evaluation:
python bin/load_and_evaluate_patchcore.py \
  --gpu 0 \
  --seed 0 \
  --save_segmentation_images \
  results/evaluation_with_images \
  patch_core_loader \
    -p /path/to/models/mvtec_bottle \
    --faiss_on_gpu \
  dataset \
    --resize 256 \
    --imagesize 224 \
    -d bottle \
    mvtec /path/to/mvtec

Evaluate IM224 Model

For models trained at 224x224 resolution:
python bin/load_and_evaluate_patchcore.py \
  --gpu 0 \
  --seed 0 \
  results/eval_im224 \
  patch_core_loader \
    -p /path/to/models/IM224/mvtec_bottle \
    --faiss_on_gpu \
  dataset \
    --resize 256 \
    --imagesize 224 \
    -d bottle \
    mvtec /path/to/mvtec

Evaluate IM320 Model

For models trained at 320x320 resolution:
python bin/load_and_evaluate_patchcore.py \
  --gpu 0 \
  --seed 0 \
  results/eval_im320 \
  patch_core_loader \
    -p /path/to/models/IM320/mvtec_bottle \
    --faiss_on_gpu \
  dataset \
    --resize 366 \
    --imagesize 320 \
    -d bottle \
    mvtec /path/to/mvtec

Model Directory Structure

Expected structure for model directories:

Single Model

mvtec_bottle/
├── *.faiss           # Nearest neighbor index
└── *.pkl             # Model parameters

Ensemble Model

mvtec_bottle/
├── Ensemble-1-3_*.faiss
├── Ensemble-1-3_*.pkl
├── Ensemble-2-3_*.faiss
├── Ensemble-2-3_*.pkl
├── Ensemble-3-3_*.faiss
└── Ensemble-3-3_*.pkl
The script automatically detects ensemble models by counting .faiss files in the directory.

Output Files

The script creates the following structure:
results/
└── [results_path]/
    ├── segmentation_images/      # (if --save_segmentation_images)
    │   └── image_*.png
    └── results.csv               # Evaluation metrics

Metrics Computed

The script evaluates the following metrics:
  • Instance AUROC: Image-level anomaly detection accuracy
  • Full Pixel AUROC: Pixel-level segmentation accuracy (all images)
  • Anomaly Pixel AUROC: Pixel-level segmentation accuracy (anomalous images only)
Results are saved to results.csv with per-dataset and mean scores.

Comparison: run_patchcore.py vs load_and_evaluate_patchcore.py

Featurerun_patchcore.pyload_and_evaluate_patchcore.py
Training✅ Yes❌ No
Evaluation✅ Yes✅ Yes
Save Models✅ Yes❌ No (loads only)
Subcommandspatch_core, sampler, datasetpatch_core_loader, dataset
Model ConfigConfigured via flagsLoaded from saved files
Use CaseTrain new modelsEvaluate existing models
Default Batch Size21

Tips

Match training config: Always use the same --resize and --imagesize values that were used during training
Batch evaluation: Use bash loops to evaluate multiple model directories efficiently (see example above)
GPU acceleration: Always use --faiss_on_gpu for faster evaluation on GPU-enabled systems
Model organization: Organize saved models by configuration (e.g., IM224_WR50_L2-3_P01) for easy identification

Common Issues

Error: “oldString not found in content” - This typically means the model files are missing or the path is incorrect. Verify that .faiss and .pkl files exist in the specified directory.
Mismatched dimensions - If you get dimension errors, ensure --imagesize matches the training configuration.
Number of models mismatch - The script requires either equal numbers of models and datasets, or a single model to evaluate across multiple datasets.

Build docs developers (and LLMs) love