Skip to main content

Overview

PatchCore training is configured through command-line arguments organized into four main groups:
  1. Global arguments: GPU, seed, logging, output paths
  2. PatchCore arguments: Backbone, layers, dimensions, patch settings
  3. Sampler arguments: Coreset sampling method and percentage
  4. Dataset arguments: Data loading, image preprocessing, categories

Global Arguments

GPU and Device Settings

--gpu
int
default:"0"
required
GPU device ID(s) to use for training.
Single GPU
--gpu 0  # Use first GPU
Multiple GPUs (not fully supported)
--gpu 0 --gpu 1  # Specify multiple devices
Location: run_patchcore.py:24
The specified GPU is used for:
  • Feature extraction from backbone
  • FAISS nearest-neighbor search (if --faiss_on_gpu enabled)
  • Coreset sampling operations
Check available GPUs with nvidia-smi.
--seed
int
default:"0"
Random seed for reproducibility.
--seed 0   # Default seed
--seed 42  # Custom seed
Location: run_patchcore.py:25
Using the same seed with the same configuration should produce identical results. However, FAISS operations may have minor non-determinism.

Model Saving

--save_patchcore_model
flag
Save trained PatchCore model(s) to disk.
--save_patchcore_model  # Enable model saving
Location: run_patchcore.py:29Saves:
  • nnscorer_search_index.faiss: Nearest neighbor search index
  • patchcore_params.pkl: Model configuration and parameters
Storage: 50-500 MB per category depending on sampling percentage.
Without this flag, trained models are discarded after evaluation. Always use this flag if you plan to deploy the model.
--save_segmentation_images
flag
Save visualization images showing anomaly segmentation results.
--save_segmentation_images
Location: run_patchcore.py:28Creates overlay images showing:
  • Original image
  • Predicted anomaly heatmap
  • Ground truth mask
  • Anomaly score

Logging and Output

results_path
string
required
Base directory for storing results.
python bin/run_patchcore.py results [...]  # Save to ./results/
python bin/run_patchcore.py /path/to/output [...]  # Custom path
Location: run_patchcore.py:23Creates structure:
results/
└── {log_project}/
    └── {log_group}_0/
        ├── models/
        └── results.csv
--log_project
string
default:"project"
Project name for organizing experiments.
--log_project MVTecAD_Results
Location: run_patchcore.py:27
--log_group
string
default:"group"
Experiment group/run name.
--log_group IM224_WR50_L2-3_P01_D1024-1024_PS-3_AN-1_S0
Location: run_patchcore.py:26
Use descriptive names encoding key parameters. Example format:
  • IM224: Image size 224
  • WR50: WideResNet50
  • L2-3: Layers 2 and 3
  • P01: 1% sampling
  • D1024-1024: Dimensions
  • S0: Seed 0
--log_online
flag
Enable Weights & Biases logging (requires W&B account).
--log_online  # Upload metrics to W&B
Location: Mentioned in README.md:102,143Set your W&B API key in the script before using.

PatchCore Arguments

All arguments under the patch_core command control the model architecture and feature extraction.

Backbone Configuration

-b, --backbone_names
string
required
Pretrained backbone network(s) for feature extraction.
Single backbone
patch_core -b wideresnet50 [...]
Multiple backbones (ensemble)
patch_core -b wideresnet101 -b resnext101 -b densenet201 [...]
Location: run_patchcore.py:242Available backbones (from backbones.py:4-47):
  • resnet50: ResNet-50
  • resnet101: ResNet-101
  • resnet200: ResNet-200 (via timm)
  • wideresnet50: Wide ResNet-50-2 (recommended)
  • wideresnet101: Wide ResNet-101-2
  • resnext101: ResNeXt-101-32x8d
  • resnest50: ResNeSt-50
All backbones are loaded with ImageNet pretrained weights.
-le, --layers_to_extract_from
string
required
Layer names to extract features from.
Single backbone
patch_core -b wideresnet50 -le layer2 -le layer3 [...]
Multiple backbones (prefix with index)
patch_core \
  -b wideresnet101 -b densenet201 \
  -le 0.layer2 -le 0.layer3 \
  -le 1.features.denseblock2 -le 1.features.denseblock3 [...]
Location: run_patchcore.py:243Layer naming by architecture:
Available layers:
  • layer1: 56×56 spatial resolution
  • layer2: 28×28 (recommended)
  • layer3: 14×14 (recommended)
  • layer4: 7×7
Recommended: layer2 + layer3 for multi-scale features
Available layers:
  • features.denseblock1: Early features
  • features.denseblock2: Mid-level (recommended)
  • features.denseblock3: High-level (recommended)
  • features.denseblock4: Final features
Note: Must include features. prefix
Available layers (example for EfficientNet-B7):
  • blocks.2: Early features
  • blocks.4: Mid features
  • blocks.6: High features (recommended)
  • blocks.8: Final features (recommended)
Layer indices vary by EfficientNet variant.
Available layers:
  • blocks.{N}: Transformer blocks (0 to 11 for ViT-Base)
Recommended: Last few blocks (e.g., blocks.9, blocks.11)
ViT support is experimental. Performance may vary.

Feature Dimensions

--pretrain_embed_dimension
int
default:"1024"
Dimensionality of features extracted from backbone layers.
--pretrain_embed_dimension 1024  # Standard
Location: run_patchcore.py:245This should match the combined output dimension of selected layers. For WideResNet50 layer2+layer3:
  • layer2: 512 channels
  • layer3: 1024 channels
  • Combined via preprocessing: 1024 dimensions
--target_embed_dimension
int
default:"1024"
Final embedding dimension after aggregation.
--target_embed_dimension 1024  # Single model
--target_embed_dimension 384   # Ensemble (lower per model)
Location: run_patchcore.py:246Trade-offs:
  • Higher (1024): Better feature representation, more memory
  • Lower (384, 512): Less memory, faster inference, minimal accuracy loss
For ensembles, use lower dimensions (384) to save memory across multiple models.

Patch Settings

--patchsize
int
default:"3"
Local neighborhood aggregation size.
--patchsize 3  # 3x3 neighborhood (detection)
--patchsize 5  # 5x5 neighborhood (segmentation)
Location: run_patchcore.py:252From PatchMaker class (patchcore.py:278-290):
  • Creates patches via unfold operation
  • Padding: (patchsize - 1) / 2
  • Aggregates local features into patch representations
Recommendations:
  • Detection focus (image-level AUROC): Use 3
  • Segmentation focus (pixel-level AUROC): Use 5
  • Larger patches smooth predictions, better for segmentation
--patchscore
string
default:"max"
Method for aggregating patch scores into image-level score.
--patchscore max  # Maximum patch score
Location: run_patchcore.py:253Typically use max (default) - most anomalous patch determines image score.
--patchoverlap
float
default:"0.0"
Overlap between patches during extraction.
--patchoverlap 0.0  # No overlap
Location: run_patchcore.py:254Usually left at 0.0. Non-zero values increase computation.

Preprocessing and Aggregation

--preprocessing
choice
default:"mean"
Method for preprocessing multi-layer features.
--preprocessing mean  # Average pooling (default)
--preprocessing conv  # Convolutional projection
Location: run_patchcore.py:247Options: mean, convmean is standard and recommended.
--aggregation
choice
default:"mean"
Method for aggregating features across layers.
--aggregation mean  # Simple averaging
--aggregation mlp   # Learnable MLP aggregation
Location: run_patchcore.py:248Options: mean, mlpmean is standard (no learnable parameters).

Anomaly Scoring

--anomaly_scorer_num_nn
int
default:"5"
Number of nearest neighbors for anomaly score computation.
--anomaly_scorer_num_nn 1  # Single NN (detection)
--anomaly_scorer_num_nn 3  # 3-NN (balanced)
--anomaly_scorer_num_nn 5  # 5-NN (segmentation)
Location: run_patchcore.py:250From NearestNeighbourScorer (referenced in patchcore.py:69-71):
  • Computes distance to k nearest neighbors in memory bank
  • Anomaly score = average distance to k-NN
Recommendations:
  • 1-NN: Best for detection, fastest
  • 3-NN: Balanced, more robust to outliers
  • 5-NN: Better segmentation, smoother scores

FAISS Configuration

--faiss_on_gpu
flag
Use GPU for FAISS nearest neighbor search.
--faiss_on_gpu  # Enable GPU acceleration
Location: run_patchcore.py:257Benefits:
  • 5-10x faster inference
  • Essential for real-time applications
Requirements:
  • Additional GPU memory (~500 MB per model)
  • FAISS GPU support installed
--faiss_num_workers
int
default:"8"
Number of CPU threads for FAISS operations.
--faiss_num_workers 8  # 8 threads
Location: run_patchcore.py:258Only relevant when not using --faiss_on_gpu.

Sampler Arguments

Controls coreset subsampling of the feature memory bank.
name
string
required
Coreset sampling algorithm.
sampler [...] approx_greedy_coreset
Location: run_patchcore.py:319Options:
Approximate Greedy Coreset Sampling (recommended)
sampler -p 0.01 approx_greedy_coreset
From ApproximateGreedyCoresetSampler (sampler.py:118-171):
  • Uses approximate distance matrix (10 starting points)
  • Much faster than exact greedy
  • Minimal performance loss
Speed: ~1-2 min for 1% of ~17k features
-p, --percentage
float
default:"0.1"
Percentage of features to keep after sampling.
sampler -p 0.1 approx_greedy_coreset   # 10%
sampler -p 0.01 approx_greedy_coreset  # 1% (recommended)
sampler -p 0.001 approx_greedy_coreset # 0.1% (extreme)
Location: run_patchcore.py:320Trade-offs:
PercentageModel SizePerformanceInference Speed
10% (0.1)~500 MBExcellentSlower
1% (0.01)~50 MBExcellent (-0.1%)Fast
0.1% (0.001)~5 MBGood (-0.5%)Very fast
Recommendation: Use 1% (-p 0.01) for production - excellent balance.

Dataset Arguments

Controls data loading and preprocessing.
name
string
required
Dataset type.
dataset [...] mvtec /path/to/data
Location: run_patchcore.py:334Currently only mvtec is supported. Custom datasets need implementation.
data_path
path
required
Path to dataset root directory.
dataset [...] mvtec /path/to/mvtec
Location: run_patchcore.py:335Must contain subdirectories for each category (see MVTec Setup).
-d, --subdatasets
string
required
MVTec category names to train on.
Single category
dataset [...] -d bottle mvtec /path/to/mvtec
Multiple categories
dataset [...] -d bottle -d cable -d capsule mvtec /path/to/mvtec
Location: run_patchcore.py:336Valid categories (from mvtec.py:8-24):
  • bottle, cable, capsule, carpet, grid
  • hazelnut, leather, metal_nut, pill, screw
  • tile, toothbrush, transistor, wood, zipper

Image Preprocessing

--resize
int
default:"256"
Initial resize dimension (before center crop).
--resize 256  # For 224x224 final images
--resize 366  # For 320x320 final images
Location: run_patchcore.py:340From MVTecDataset (mvtec.py:74-87):
  1. Load image
  2. transforms.Resize(resize) → Resize to (resize, resize)
  3. transforms.CenterCrop(imagesize) → Crop to (imagesize, imagesize)
Common combinations:
  • Resize 256 → Crop 224
  • Resize 366 → Crop 320
--imagesize
int
default:"224"
Final image size after center crop (input to backbone).
--imagesize 224  # Standard
--imagesize 320  # Higher resolution
Location: run_patchcore.py:341Impact:
  • 224: Standard ImageNet size, fastest
  • 320: Better localization, 1.5-2x slower
  • Higher: Requires more GPU memory
--augment
flag
Enable data augmentation during training.
--augment  # Enable augmentation
Location: run_patchcore.py:342
Data augmentation is typically not used with PatchCore, as the method assumes consistent image distributions. Only use for experimentation.

Data Loading

--batch_size
int
default:"2"
Number of images per batch.
--batch_size 2  # Default
--batch_size 1  # Reduce for OOM issues
--batch_size 8  # Increase if memory allows
Location: run_patchcore.py:338Larger batches can speed up training but require more GPU memory.
--num_workers
int
default:"8"
Number of CPU workers for data loading.
--num_workers 8   # 8 parallel workers
--num_workers 4   # Fewer workers if CPU-constrained
Location: run_patchcore.py:339More workers = faster data loading, but uses more CPU/RAM.
--train_val_split
float
default:"1.0"
Fraction of training data to use for training (vs. validation).
--train_val_split 1.0   # Use all training data
--train_val_split 0.8   # 80% train, 20% validation
Location: run_patchcore.py:337Typically use 1.0 (all data for training, no validation split).

Configuration Examples

Here are complete, working configurations for common scenarios:
Simplest possible training command:
python bin/run_patchcore.py results \
patch_core -b wideresnet50 -le layer2 -le layer3 \
  --pretrain_embed_dimension 1024 --target_embed_dimension 1024 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.1 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 -d bottle mvtec /path/to/mvtec
Uses all defaults, trains single category.
Recommended production configuration (99.2% AUROC):
python bin/run_patchcore.py \
  --gpu 0 --seed 0 --save_patchcore_model \
  --log_group IM224_WR50_Production --log_project MVTecAD \
  results \
patch_core \
  -b wideresnet50 -le layer2 -le layer3 --faiss_on_gpu \
  --pretrain_embed_dimension 1024 --target_embed_dimension 1024 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 --batch_size 2 --num_workers 8 \
  -d bottle -d cable [...all categories...] mvtec /path/to/mvtec
Best performance configuration (99.6% AUROC):
python bin/run_patchcore.py \
  --gpu 0 --seed 40 --save_patchcore_model \
  --log_group IM320_Ensemble_Best --log_project MVTecAD \
  results \
patch_core \
  -b wideresnet101 -b resnext101 -b densenet201 \
  -le 0.layer2 -le 0.layer3 \
  -le 1.layer2 -le 1.layer3 \
  -le 2.features.denseblock2 -le 2.features.denseblock3 \
  --faiss_on_gpu \
  --pretrain_embed_dimension 1024 --target_embed_dimension 384 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 366 --imagesize 320 --batch_size 2 --num_workers 8 \
  -d bottle -d cable [...all categories...] mvtec /path/to/mvtec
Configuration for limited GPU memory (under 11 GB):
python bin/run_patchcore.py \
  --gpu 0 --seed 0 --save_patchcore_model \
  results \
patch_core \
  -b resnet50 -le layer2 -le layer3 \
  --pretrain_embed_dimension 1024 --target_embed_dimension 512 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 --batch_size 1 --num_workers 4 \
  -d bottle mvtec /path/to/mvtec
Changes:
  • Smaller backbone (ResNet50 vs WideResNet50)
  • Lower target dimension (512 vs 1024)
  • Batch size 1
  • No --faiss_on_gpu

Parameter Interactions

Memory Usage Factors

GPU memory usage is primarily determined by:
Memory Formula (approximate)
GPU_memory = (
    backbone_memory +           # 1-3 GB depending on architecture
    batch_size * imagesize^2 * channels * 4 bytes +  # Input images
    feature_map_memory +        # Depends on layers and dimensions
    faiss_index_memory          # If --faiss_on_gpu (0.5-2 GB)
)
To reduce memory:
  1. Lower --batch_size
  2. Smaller --imagesize
  3. Reduce --target_embed_dimension
  4. Remove --faiss_on_gpu
  5. Use smaller backbone

Training Time Factors

Training time per category:
Time Formula (approximate)
time_per_category = (
    num_train_images * feature_extraction_time +  # ~0.1s per image
    num_features * sampling_percentage * sampling_time  # ~0.001s per feature
)

# Multiplied by number of backbones for ensemble
To reduce time:
  1. Lower sampling percentage (-p 0.01 vs -p 0.1)
  2. Use approx_greedy_coreset (not greedy_coreset)
  3. Enable --faiss_on_gpu
  4. Smaller backbone (ResNet50 vs ResNet101)

Accuracy vs. Efficiency

ConfigurationAUROCSpeedMemoryUse Case
WR50, 224, 10%99.2%Fast8 GBDevelopment
WR50, 224, 1%99.2%Fast8 GBProduction
WR50, 320, 1%99.3%Medium11 GBHigh quality
Ensemble, 224, 1%99.3%Slow12 GBBalanced
Ensemble, 320, 1%99.6%Slowest16 GBMaximum

Next Steps

Train Single Model

Apply these configurations to train your first model

Ensemble Training

Combine multiple backbones for best performance

Build docs developers (and LLMs) love