Overview
PatchCore training is configured through command-line arguments organized into four main groups:- Global arguments: GPU, seed, logging, output paths
- PatchCore arguments: Backbone, layers, dimensions, patch settings
- Sampler arguments: Coreset sampling method and percentage
- Dataset arguments: Data loading, image preprocessing, categories
Global Arguments
GPU and Device Settings
GPU device ID(s) to use for training.Location:
Single GPU
Multiple GPUs (not fully supported)
run_patchcore.py:24How GPU selection works
How GPU selection works
The specified GPU is used for:
- Feature extraction from backbone
- FAISS nearest-neighbor search (if
--faiss_on_gpuenabled) - Coreset sampling operations
nvidia-smi.Random seed for reproducibility.Location:
run_patchcore.py:25Using the same seed with the same configuration should produce identical results. However, FAISS operations may have minor non-determinism.
Model Saving
Save trained PatchCore model(s) to disk.Location:
run_patchcore.py:29Saves:nnscorer_search_index.faiss: Nearest neighbor search indexpatchcore_params.pkl: Model configuration and parameters
Save visualization images showing anomaly segmentation results.Location:
run_patchcore.py:28Creates overlay images showing:- Original image
- Predicted anomaly heatmap
- Ground truth mask
- Anomaly score
Logging and Output
Base directory for storing results.Location:
run_patchcore.py:23Creates structure:Project name for organizing experiments.Location:
run_patchcore.py:27Experiment group/run name.Location:
run_patchcore.py:26Use descriptive names encoding key parameters. Example format:
IM224: Image size 224WR50: WideResNet50L2-3: Layers 2 and 3P01: 1% samplingD1024-1024: DimensionsS0: Seed 0
Enable Weights & Biases logging (requires W&B account).Location: Mentioned in
README.md:102,143Set your W&B API key in the script before using.PatchCore Arguments
All arguments under thepatch_core command control the model architecture and feature extraction.
Backbone Configuration
Pretrained backbone network(s) for feature extraction.Location:
Single backbone
Multiple backbones (ensemble)
run_patchcore.py:242Available backbones (from backbones.py:4-47):- ResNet Family
- DenseNet
- EfficientNet
- Vision Transformers
- Other
resnet50: ResNet-50resnet101: ResNet-101resnet200: ResNet-200 (via timm)wideresnet50: Wide ResNet-50-2 (recommended)wideresnet101: Wide ResNet-101-2resnext101: ResNeXt-101-32x8dresnest50: ResNeSt-50
All backbones are loaded with ImageNet pretrained weights.
Layer names to extract features from.Location:
Single backbone
Multiple backbones (prefix with index)
run_patchcore.py:243Layer naming by architecture:ResNet / WideResNet / ResNeXt
ResNet / WideResNet / ResNeXt
Available layers:
layer1: 56×56 spatial resolutionlayer2: 28×28 (recommended)layer3: 14×14 (recommended)layer4: 7×7
layer2 + layer3 for multi-scale featuresDenseNet
DenseNet
Available layers:
features.denseblock1: Early featuresfeatures.denseblock2: Mid-level (recommended)features.denseblock3: High-level (recommended)features.denseblock4: Final features
features. prefixEfficientNet
EfficientNet
Available layers (example for EfficientNet-B7):
blocks.2: Early featuresblocks.4: Mid featuresblocks.6: High features (recommended)blocks.8: Final features (recommended)
Vision Transformers
Vision Transformers
Available layers:
blocks.{N}: Transformer blocks (0 to 11 for ViT-Base)
blocks.9, blocks.11)Feature Dimensions
Dimensionality of features extracted from backbone layers.Location:
run_patchcore.py:245This should match the combined output dimension of selected layers. For WideResNet50 layer2+layer3:- layer2: 512 channels
- layer3: 1024 channels
- Combined via preprocessing: 1024 dimensions
Final embedding dimension after aggregation.Location:
run_patchcore.py:246Trade-offs:- Higher (1024): Better feature representation, more memory
- Lower (384, 512): Less memory, faster inference, minimal accuracy loss
Patch Settings
Local neighborhood aggregation size.Location:
run_patchcore.py:252From PatchMaker class (patchcore.py:278-290):- Creates patches via unfold operation
- Padding:
(patchsize - 1) / 2 - Aggregates local features into patch representations
- Detection focus (image-level AUROC): Use 3
- Segmentation focus (pixel-level AUROC): Use 5
- Larger patches smooth predictions, better for segmentation
Method for aggregating patch scores into image-level score.Location:
run_patchcore.py:253Typically use max (default) - most anomalous patch determines image score.Overlap between patches during extraction.Location:
run_patchcore.py:254Usually left at 0.0. Non-zero values increase computation.Preprocessing and Aggregation
Method for preprocessing multi-layer features.Location:
run_patchcore.py:247Options: mean, convmean is standard and recommended.Method for aggregating features across layers.Location:
run_patchcore.py:248Options: mean, mlpmean is standard (no learnable parameters).Anomaly Scoring
Number of nearest neighbors for anomaly score computation.Location:
run_patchcore.py:250From NearestNeighbourScorer (referenced in patchcore.py:69-71):- Computes distance to k nearest neighbors in memory bank
- Anomaly score = average distance to k-NN
- 1-NN: Best for detection, fastest
- 3-NN: Balanced, more robust to outliers
- 5-NN: Better segmentation, smoother scores
FAISS Configuration
Use GPU for FAISS nearest neighbor search.Location:
run_patchcore.py:257Benefits:- 5-10x faster inference
- Essential for real-time applications
- Additional GPU memory (~500 MB per model)
- FAISS GPU support installed
Number of CPU threads for FAISS operations.Location:
run_patchcore.py:258Only relevant when not using --faiss_on_gpu.Sampler Arguments
Controls coreset subsampling of the feature memory bank.Coreset sampling algorithm.Location:
run_patchcore.py:319Options:- approx_greedy_coreset
- greedy_coreset
- identity
Approximate Greedy Coreset Sampling (recommended)From ApproximateGreedyCoresetSampler (
sampler.py:118-171):- Uses approximate distance matrix (10 starting points)
- Much faster than exact greedy
- Minimal performance loss
Percentage of features to keep after sampling.Location:
Recommendation: Use 1% (
run_patchcore.py:320Trade-offs:| Percentage | Model Size | Performance | Inference Speed |
|---|---|---|---|
| 10% (0.1) | ~500 MB | Excellent | Slower |
| 1% (0.01) | ~50 MB | Excellent (-0.1%) | Fast |
| 0.1% (0.001) | ~5 MB | Good (-0.5%) | Very fast |
-p 0.01) for production - excellent balance.Dataset Arguments
Controls data loading and preprocessing.Dataset type.Location:
run_patchcore.py:334Currently only mvtec is supported. Custom datasets need implementation.Path to dataset root directory.Location:
run_patchcore.py:335Must contain subdirectories for each category (see MVTec Setup).MVTec category names to train on.Location:
Single category
Multiple categories
run_patchcore.py:336Valid categories (from mvtec.py:8-24):bottle,cable,capsule,carpet,gridhazelnut,leather,metal_nut,pill,screwtile,toothbrush,transistor,wood,zipper
Image Preprocessing
Initial resize dimension (before center crop).Location:
run_patchcore.py:340From MVTecDataset (mvtec.py:74-87):- Load image
transforms.Resize(resize)→ Resize to (resize, resize)transforms.CenterCrop(imagesize)→ Crop to (imagesize, imagesize)
- Resize 256 → Crop 224
- Resize 366 → Crop 320
Final image size after center crop (input to backbone).Location:
run_patchcore.py:341Impact:- 224: Standard ImageNet size, fastest
- 320: Better localization, 1.5-2x slower
- Higher: Requires more GPU memory
Enable data augmentation during training.Location:
run_patchcore.py:342Data Loading
Number of images per batch.Location:
run_patchcore.py:338Larger batches can speed up training but require more GPU memory.Number of CPU workers for data loading.Location:
run_patchcore.py:339More workers = faster data loading, but uses more CPU/RAM.Fraction of training data to use for training (vs. validation).Location:
run_patchcore.py:337Typically use 1.0 (all data for training, no validation split).Configuration Examples
Here are complete, working configurations for common scenarios:Minimal Configuration
Minimal Configuration
Simplest possible training command:Uses all defaults, trains single category.
Production Baseline
Production Baseline
Recommended production configuration (99.2% AUROC):
Maximum Performance
Maximum Performance
Best performance configuration (99.6% AUROC):
Memory-Constrained
Memory-Constrained
Configuration for limited GPU memory (under 11 GB):Changes:
- Smaller backbone (ResNet50 vs WideResNet50)
- Lower target dimension (512 vs 1024)
- Batch size 1
- No
--faiss_on_gpu
Parameter Interactions
Memory Usage Factors
GPU memory usage is primarily determined by:Memory Formula (approximate)
- Lower
--batch_size - Smaller
--imagesize - Reduce
--target_embed_dimension - Remove
--faiss_on_gpu - Use smaller backbone
Training Time Factors
Training time per category:Time Formula (approximate)
- Lower sampling percentage (
-p 0.01vs-p 0.1) - Use
approx_greedy_coreset(notgreedy_coreset) - Enable
--faiss_on_gpu - Smaller backbone (ResNet50 vs ResNet101)
Accuracy vs. Efficiency
| Configuration | AUROC | Speed | Memory | Use Case |
|---|---|---|---|---|
| WR50, 224, 10% | 99.2% | Fast | 8 GB | Development |
| WR50, 224, 1% | 99.2% | Fast | 8 GB | Production |
| WR50, 320, 1% | 99.3% | Medium | 11 GB | High quality |
| Ensemble, 224, 1% | 99.3% | Slow | 12 GB | Balanced |
| Ensemble, 320, 1% | 99.6% | Slowest | 16 GB | Maximum |
Next Steps
Train Single Model
Apply these configurations to train your first model
Ensemble Training
Combine multiple backbones for best performance
