Overview
This repository provides pretrained PatchCore models that achieve state-of-the-art performance on MVTec AD, with up to 99.6% image-level AUROC, 98.4% pixel-level AUROC, and >95% PRO score. Each model is trained on one of the 15 MVTec AD subdatasets and can be directly loaded for inference without retraining.Available Model Configurations
WideResNet50 Baseline
Standard single-backbone model optimized for 224x224 images
- Image AUROC: 99.2%
- Pixel AUROC: 98.1%
- PRO Score: 94.4%
Ensemble Model
Multi-backbone ensemble with WR101/ResNext101/DenseNet201
- Image AUROC: 99.6%
- Pixel AUROC: 98.2%
- PRO Score: 94.9%
Model Structure
Pretrained models are organized in the following directory structure:Model Files
Each category model contains two files:nnscorer_search_index.faiss- FAISS index for nearest neighbor searchpatchcore_params.pkl- Serialized PatchCore parameters and memory bank
MVTec AD Categories
Pretrained models are available for all 15 MVTec AD object and texture categories: Object Categories:- bottle
- cable
- capsule
- hazelnut
- metal_nut
- pill
- screw
- toothbrush
- transistor
- zipper
- carpet
- grid
- leather
- tile
- wood
Loading Pretrained Models
Evaluation Script
Use the provided evaluation script to load and test pretrained models:Loading Individual Models
To load a single category model:Model Configurations
WideResNet50 Baseline (IM224)
Configuration:- Backbone: Wide ResNet-50
- Feature layers: layer2, layer3
- Input size: 224x224 (resized from 256x256)
- Coreset sampling: 10%
- Embedding dimensions: 1024 → 1024
- Patch size: 3
- Nearest neighbors: 1
IM224_WR50_L2-3_P01_D1024-1024_PS-3_AN-1
Ensemble Model (IM224)
Configuration:- Backbones: Wide ResNet-101, ResNeXt-101, DenseNet-201
- Feature layers:
- WR101: layer2, layer3
- ResNeXt101: layer2, layer3
- DenseNet201: denseblock2, denseblock3
- Input size: 224x224
- Coreset sampling: 1%
- Embedding dimensions: 1024 → 384
- Patch size: 3
- Nearest neighbors: 1
IM224_Ensemble_L2-3_P001_D1024-384_PS-3_AN-1
Higher Resolution Models (IM320)
Models trained on 320x320 images achieve similar or better performance:- WR50 (IM320): 99.3% Image AUROC, 97.8% Pixel AUROC
- Ensemble (IM320): 99.6% Image AUROC, 98.2% Pixel AUROC
Higher resolution models (320x320) require more GPU memory but can provide better localization accuracy.
Inference Requirements
Hardware
- GPU Memory: 11GB recommended (varies with image size)
- GPU: CUDA-compatible GPU for FAISS acceleration
- CPU: Any modern multi-core processor
Software Dependencies
requirements.txt for complete dependency list.
Model Performance
For detailed per-category performance metrics, see the Performance Benchmarks page.Training Custom Models
To train your own PatchCore model with the same configuration:Output Format
After evaluation, results are saved as:results.csv contains:
instance_auroc- Image-level AUROCfull_pixel_auroc- Pixel-level AUROCfull_pro- PRO scoreanomaly_pixel_auroc- AUROC on anomalous pixels onlyanomaly_pro- PRO on anomalous regions only
All pretrained models use ImageNet-pretrained backbone weights for feature extraction.
