Training Overview

What is PatchCore Training?

PatchCore doesn’t use traditional neural network training with backpropagation. Instead, it extracts and stores a memory bank of feature representations from normal (defect-free) training images.

Training in PatchCore refers to:

Extracting features from training images using a pretrained CNN backbone
Performing local aggregation on patch features
Subsampling the feature set using coreset selection
Building a nearest-neighbor search index for anomaly detection

Training Workflow

Prepare MVTec AD Dataset

Download and organize the MVTec AD dataset with proper directory structure.See MVTec Setup for detailed instructions.

Choose Model Configuration

Select backbone network, layers to extract features from, and sampling parameters.

Single model: Use one backbone (e.g., WideResNet50) for faster training
Ensemble: Use multiple backbones for better performance

Run Training

Execute the training command. PatchCore will:

Load training images (only “good” samples)
Extract features from selected backbone layers
Apply coreset subsampling to reduce memory
Build FAISS nearest-neighbor index

Save Model

Use --save_patchcore_model flag to save:

Feature memory bank
FAISS search index
Model parameters (backbone, layers, dimensions)

Performance Characteristics

Training Time

PatchCore training is fast compared to traditional deep learning:

Single model (WideResNet50): ~5-10 minutes per MVTec category on GPU
Ensemble (3 backbones): ~15-30 minutes per category on GPU
Time scales with:
- Image resolution (224x224 vs 320x320)
- Number of training samples
- Coreset sampling percentage

Memory Requirements

GPU Memory

11GB recommended for most experiments
16GB required for:
- Large image sizes (320x320 or higher)
- Ensemble models with 3+ backbones
- Higher embedding dimensions

Disk Storage

Per model: 50-500 MB depending on:
- Coreset sampling percentage (1% vs 10%)
- Embedding dimensions
- Number of training samples
15 MVTec categories: 1-7 GB total

Hardware Recommendations

Minimum Requirements

Minimum Configuration

GPU: NVIDIA GPU with 11GB VRAM (e.g., RTX 2080 Ti, RTX 3060)
CPU: 4+ cores
RAM: 16 GB
Storage: 50 GB (dataset + models)

Recommended for Production

Recommended Configuration

GPU: NVIDIA GPU with 16GB+ VRAM (e.g., RTX 3090, A5000, V100)
CPU: 8+ cores
RAM: 32 GB
Storage: 100 GB SSD

Significantly large input images (>512x512) will require more GPU memory and may not fit on 11GB GPUs.

Key Configuration Decisions

Backbone Selection

The choice of backbone affects both performance and speed:

Backbone	Performance	Speed	Memory
WideResNet50	Good	Fast	Moderate
WideResNet101	Better	Slower	Higher
Ensemble (3 models)	Best	Slowest	Highest

Coreset Sampling Percentage

Controls the trade-off between memory usage and performance:

10% (-p 0.1): Good for development and testing
1% (-p 0.01): Recommended for production (minimal performance loss)
Lower percentages = less memory, faster inference, minimal accuracy impact

In the original paper, 1% coreset sampling achieved 99.2% AUROC on MVTec AD with WideResNet50.

Image Resolution

Higher resolution improves localization but increases compute:

224x224: Baseline resolution, fastest training
320x320: Better localization, ~1.5x slower
Higher: Possible but requires more GPU memory

Expected Results

Using the recommended configurations from single-model and ensemble-models:

WideResNet50 Baseline (224x224)

Performance Metrics

Instance AUROC: 99.2%
Pixel-wise AUROC: 98.1%
PRO Score: 94.4%

Training time: ~1-2 hours (all 15 categories)
GPU memory: 8-10 GB
Model size: 1-2 GB total

Ensemble Model (224x224)

Performance Metrics

Instance AUROC: 99.3%
Pixel-wise AUROC: 98.1%
PRO Score: 94.2%

Training time: ~3-5 hours (all 15 categories)
GPU memory: 10-12 GB
Model size: 4-7 GB total

Ensemble Model (320x320)

Performance Metrics

Instance AUROC: 99.6%
Pixel-wise AUROC: 98.2%
PRO Score: 94.9%

Training time: ~5-8 hours (all 15 categories)
GPU memory: 12-15 GB
Model size: 5-10 GB total

Next Steps

Setup MVTec Dataset

Download and organize the MVTec AD benchmark dataset

Train Single Model

Train your first PatchCore model with WideResNet50

Configuration Guide

Deep dive into all training parameters

Ensemble Models

Combine multiple backbones for maximum performance

Get Started

Core Concepts

Training

Inference

Model Zoo

What is PatchCore Training?

Training Workflow

Performance Characteristics

Training Time

Memory Requirements

GPU Memory

Disk Storage

Hardware Recommendations

Minimum Requirements

Recommended for Production

Key Configuration Decisions

Backbone Selection

Coreset Sampling Percentage

Image Resolution

Expected Results

Next Steps

Setup MVTec Dataset

Train Single Model

Configuration Guide

Ensemble Models

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Inference

Model Zoo

​What is PatchCore Training?

​Training Workflow

​Performance Characteristics

​Training Time

​Memory Requirements

GPU Memory

Disk Storage

​Hardware Recommendations

​Minimum Requirements

​Recommended for Production

​Key Configuration Decisions

​Backbone Selection

​Coreset Sampling Percentage

​Image Resolution

​Expected Results

​Next Steps

Setup MVTec Dataset

Train Single Model

Configuration Guide

Ensemble Models

Build docs developers (and LLMs) love

What is PatchCore Training?

Training Workflow

Performance Characteristics

Training Time

Memory Requirements

Hardware Recommendations

Minimum Requirements

Recommended for Production

Key Configuration Decisions

Backbone Selection

Coreset Sampling Percentage

Image Resolution

Expected Results

Next Steps