What is PatchCore Training?
PatchCore doesn’t use traditional neural network training with backpropagation. Instead, it extracts and stores a memory bank of feature representations from normal (defect-free) training images.Training in PatchCore refers to:
- Extracting features from training images using a pretrained CNN backbone
- Performing local aggregation on patch features
- Subsampling the feature set using coreset selection
- Building a nearest-neighbor search index for anomaly detection
Training Workflow
Prepare MVTec AD Dataset
Download and organize the MVTec AD dataset with proper directory structure.See MVTec Setup for detailed instructions.
Choose Model Configuration
Select backbone network, layers to extract features from, and sampling parameters.
- Single model: Use one backbone (e.g., WideResNet50) for faster training
- Ensemble: Use multiple backbones for better performance
Run Training
Execute the training command. PatchCore will:
- Load training images (only “good” samples)
- Extract features from selected backbone layers
- Apply coreset subsampling to reduce memory
- Build FAISS nearest-neighbor index
Performance Characteristics
Training Time
PatchCore training is fast compared to traditional deep learning:- Single model (WideResNet50): ~5-10 minutes per MVTec category on GPU
- Ensemble (3 backbones): ~15-30 minutes per category on GPU
- Time scales with:
- Image resolution (224x224 vs 320x320)
- Number of training samples
- Coreset sampling percentage
Memory Requirements
GPU Memory
- 11GB recommended for most experiments
- 16GB required for:
- Large image sizes (320x320 or higher)
- Ensemble models with 3+ backbones
- Higher embedding dimensions
Disk Storage
- Per model: 50-500 MB depending on:
- Coreset sampling percentage (1% vs 10%)
- Embedding dimensions
- Number of training samples
- 15 MVTec categories: 1-7 GB total
Hardware Recommendations
Minimum Requirements
Minimum Configuration
Recommended for Production
Recommended Configuration
Key Configuration Decisions
Backbone Selection
The choice of backbone affects both performance and speed:| Backbone | Performance | Speed | Memory |
|---|---|---|---|
| WideResNet50 | Good | Fast | Moderate |
| WideResNet101 | Better | Slower | Higher |
| Ensemble (3 models) | Best | Slowest | Highest |
Coreset Sampling Percentage
Controls the trade-off between memory usage and performance:- 10% (
-p 0.1): Good for development and testing - 1% (
-p 0.01): Recommended for production (minimal performance loss) - Lower percentages = less memory, faster inference, minimal accuracy impact
In the original paper, 1% coreset sampling achieved 99.2% AUROC on MVTec AD with WideResNet50.
Image Resolution
Higher resolution improves localization but increases compute:- 224x224: Baseline resolution, fastest training
- 320x320: Better localization, ~1.5x slower
- Higher: Possible but requires more GPU memory
Expected Results
Using the recommended configurations from single-model and ensemble-models:WideResNet50 Baseline (224x224)
WideResNet50 Baseline (224x224)
Performance Metrics
Ensemble Model (224x224)
Ensemble Model (224x224)
Performance Metrics
Ensemble Model (320x320)
Ensemble Model (320x320)
Performance Metrics
Next Steps
Setup MVTec Dataset
Download and organize the MVTec AD benchmark dataset
Train Single Model
Train your first PatchCore model with WideResNet50
Configuration Guide
Deep dive into all training parameters
Ensemble Models
Combine multiple backbones for maximum performance
