Configuration Reference

Overview

PatchCore training is configured through command-line arguments organized into four main groups:

Global arguments: GPU, seed, logging, output paths
PatchCore arguments: Backbone, layers, dimensions, patch settings
Sampler arguments: Coreset sampling method and percentage
Dataset arguments: Data loading, image preprocessing, categories

Global Arguments

GPU and Device Settings

--gpu

int

default:"0"

required

GPU device ID(s) to use for training.

Single GPU

--gpu 0  # Use first GPU

Multiple GPUs (not fully supported)

--gpu 0 --gpu 1  # Specify multiple devices

Location: run_patchcore.py:24

How GPU selection works

The specified GPU is used for:

Feature extraction from backbone
FAISS nearest-neighbor search (if --faiss_on_gpu enabled)
Coreset sampling operations

Check available GPUs with nvidia-smi.

--seed

int

default:"0"

Random seed for reproducibility.

--seed 0   # Default seed
--seed 42  # Custom seed

Location: run_patchcore.py:25

Using the same seed with the same configuration should produce identical results. However, FAISS operations may have minor non-determinism.

Model Saving

--save_patchcore_model

flag

Save trained PatchCore model(s) to disk.

--save_patchcore_model  # Enable model saving

Location: run_patchcore.py:29Saves:

nnscorer_search_index.faiss: Nearest neighbor search index
patchcore_params.pkl: Model configuration and parameters

Storage: 50-500 MB per category depending on sampling percentage.

Without this flag, trained models are discarded after evaluation. Always use this flag if you plan to deploy the model.

--save_segmentation_images

flag

Save visualization images showing anomaly segmentation results.

--save_segmentation_images

Location: run_patchcore.py:28Creates overlay images showing:

Original image
Predicted anomaly heatmap
Ground truth mask
Anomaly score

Logging and Output

results_path

string

required

Base directory for storing results.

python bin/run_patchcore.py results [...]  # Save to ./results/
python bin/run_patchcore.py /path/to/output [...]  # Custom path

Location: run_patchcore.py:23Creates structure:

results/
└── {log_project}/
    └── {log_group}_0/
        ├── models/
        └── results.csv

--log_project

string

default:"project"

Project name for organizing experiments.

--log_project MVTecAD_Results

Location: run_patchcore.py:27

--log_group

string

default:"group"

Experiment group/run name.

--log_group IM224_WR50_L2-3_P01_D1024-1024_PS-3_AN-1_S0

Location: run_patchcore.py:26

Use descriptive names encoding key parameters. Example format:

IM224: Image size 224
WR50: WideResNet50
L2-3: Layers 2 and 3
P01: 1% sampling
D1024-1024: Dimensions
S0: Seed 0

--log_online

flag

Enable Weights & Biases logging (requires W&B account).

--log_online  # Upload metrics to W&B

Location: Mentioned in README.md:102,143Set your W&B API key in the script before using.

PatchCore Arguments

All arguments under the patch_core command control the model architecture and feature extraction.

Backbone Configuration

-b, --backbone_names

string

required

Pretrained backbone network(s) for feature extraction.

Single backbone

patch_core -b wideresnet50 [...]

Multiple backbones (ensemble)

patch_core -b wideresnet101 -b resnext101 -b densenet201 [...]

Location: run_patchcore.py:242Available backbones (from backbones.py:4-47):

ResNet Family
DenseNet
EfficientNet
Vision Transformers
Other

resnet50: ResNet-50
resnet101: ResNet-101
resnet200: ResNet-200 (via timm)
wideresnet50: Wide ResNet-50-2 (recommended)
wideresnet101: Wide ResNet-101-2
resnext101: ResNeXt-101-32x8d
resnest50: ResNeSt-50

densenet121: DenseNet-121
densenet201: DenseNet-201 (used in ensembles)

efficientnet_b1 through efficientnet_b7
efficientnetv2_m: EfficientNetV2-M
efficientnetv2_l: EfficientNetV2-L

vit_small: ViT-Small/16
vit_base: ViT-Base/16
vit_large: ViT-Large/16
vit_deit_base: DeiT-Base
vit_swin_base: Swin-Base

alexnet: AlexNet
vgg11, vgg19, vgg19_bn: VGG variants
inception_v4: Inception v4
mnasnet_100: MnasNet

All backbones are loaded with ImageNet pretrained weights.

-le, --layers_to_extract_from

string

required

Layer names to extract features from.

Single backbone

patch_core -b wideresnet50 -le layer2 -le layer3 [...]

Multiple backbones (prefix with index)

patch_core \
  -b wideresnet101 -b densenet201 \
  -le 0.layer2 -le 0.layer3 \
  -le 1.features.denseblock2 -le 1.features.denseblock3 [...]

Location: run_patchcore.py:243Layer naming by architecture:

ResNet / WideResNet / ResNeXt

Available layers:

layer1: 56×56 spatial resolution
layer2: 28×28 (recommended)
layer3: 14×14 (recommended)
layer4: 7×7

Recommended: layer2 + layer3 for multi-scale features

DenseNet

Available layers:

features.denseblock1: Early features
features.denseblock2: Mid-level (recommended)
features.denseblock3: High-level (recommended)
features.denseblock4: Final features

Note: Must include features. prefix

EfficientNet

Available layers (example for EfficientNet-B7):

blocks.2: Early features
blocks.4: Mid features
blocks.6: High features (recommended)
blocks.8: Final features (recommended)

Layer indices vary by EfficientNet variant.

Vision Transformers

Available layers:

blocks.{N}: Transformer blocks (0 to 11 for ViT-Base)

Recommended: Last few blocks (e.g., blocks.9, blocks.11)

ViT support is experimental. Performance may vary.

Feature Dimensions

--pretrain_embed_dimension

int

default:"1024"

Dimensionality of features extracted from backbone layers.

--pretrain_embed_dimension 1024  # Standard

Location: run_patchcore.py:245This should match the combined output dimension of selected layers. For WideResNet50 layer2+layer3:

layer2: 512 channels
layer3: 1024 channels
Combined via preprocessing: 1024 dimensions

--target_embed_dimension

int

default:"1024"

Final embedding dimension after aggregation.

--target_embed_dimension 1024  # Single model
--target_embed_dimension 384   # Ensemble (lower per model)

Location: run_patchcore.py:246Trade-offs:

Higher (1024): Better feature representation, more memory
Lower (384, 512): Less memory, faster inference, minimal accuracy loss

For ensembles, use lower dimensions (384) to save memory across multiple models.

Patch Settings

--patchsize

int

default:"3"

Local neighborhood aggregation size.

--patchsize 3  # 3x3 neighborhood (detection)
--patchsize 5  # 5x5 neighborhood (segmentation)

Location: run_patchcore.py:252From PatchMaker class (patchcore.py:278-290):

Creates patches via unfold operation
Padding: (patchsize - 1) / 2
Aggregates local features into patch representations

Recommendations:

Detection focus (image-level AUROC): Use 3
Segmentation focus (pixel-level AUROC): Use 5
Larger patches smooth predictions, better for segmentation

--patchscore

string

default:"max"

Method for aggregating patch scores into image-level score.

--patchscore max  # Maximum patch score

Location: run_patchcore.py:253Typically use max (default) - most anomalous patch determines image score.

--patchoverlap

float

default:"0.0"

Overlap between patches during extraction.

--patchoverlap 0.0  # No overlap

Location: run_patchcore.py:254Usually left at 0.0. Non-zero values increase computation.

Preprocessing and Aggregation

--preprocessing

choice

default:"mean"

Method for preprocessing multi-layer features.

--preprocessing mean  # Average pooling (default)
--preprocessing conv  # Convolutional projection

Location: run_patchcore.py:247Options: mean, convmean is standard and recommended.

--aggregation

choice

default:"mean"

Method for aggregating features across layers.

--aggregation mean  # Simple averaging
--aggregation mlp   # Learnable MLP aggregation

Location: run_patchcore.py:248Options: mean, mlpmean is standard (no learnable parameters).

Anomaly Scoring

--anomaly_scorer_num_nn

int

default:"5"

Number of nearest neighbors for anomaly score computation.

--anomaly_scorer_num_nn 1  # Single NN (detection)
--anomaly_scorer_num_nn 3  # 3-NN (balanced)
--anomaly_scorer_num_nn 5  # 5-NN (segmentation)

Location: run_patchcore.py:250From NearestNeighbourScorer (referenced in patchcore.py:69-71):

Computes distance to k nearest neighbors in memory bank
Anomaly score = average distance to k-NN

Recommendations:

1-NN: Best for detection, fastest
3-NN: Balanced, more robust to outliers
5-NN: Better segmentation, smoother scores

FAISS Configuration

--faiss_on_gpu

flag

Use GPU for FAISS nearest neighbor search.

--faiss_on_gpu  # Enable GPU acceleration

Location: run_patchcore.py:257Benefits:

5-10x faster inference
Essential for real-time applications

Requirements:

Additional GPU memory (~500 MB per model)
FAISS GPU support installed

--faiss_num_workers

int

default:"8"

Number of CPU threads for FAISS operations.

--faiss_num_workers 8  # 8 threads

Location: run_patchcore.py:258Only relevant when not using --faiss_on_gpu.

Sampler Arguments

Controls coreset subsampling of the feature memory bank.

name

string

required

Coreset sampling algorithm.

sampler [...] approx_greedy_coreset

Location: run_patchcore.py:319Options:

approx_greedy_coreset
greedy_coreset
identity

Approximate Greedy Coreset Sampling (recommended)

sampler -p 0.01 approx_greedy_coreset

From ApproximateGreedyCoresetSampler (sampler.py:118-171):

Uses approximate distance matrix (10 starting points)
Much faster than exact greedy
Minimal performance loss

Speed: ~1-2 min for 1% of ~17k features

Exact Greedy Coreset Sampling

sampler -p 0.01 greedy_coreset

From GreedyCoresetSampler (sampler.py:39-116):

Computes full N×N distance matrix
Exact greedy selection
Very slow and memory-intensive

Speed: ~10-30 min for 1% of ~17k features

Not recommended for large datasets. Use approx_greedy_coreset instead.

No Sampling (identity sampler)

sampler identity

From IdentitySampler (sampler.py:9-13):

Keeps all features (no subsampling)
Maximum performance but huge memory
Memory bank can be 10-50 GB per category

Only use for baseline comparisons.

-p, --percentage

float

default:"0.1"

Percentage of features to keep after sampling.

sampler -p 0.1 approx_greedy_coreset   # 10%
sampler -p 0.01 approx_greedy_coreset  # 1% (recommended)
sampler -p 0.001 approx_greedy_coreset # 0.1% (extreme)

Location: run_patchcore.py:320Trade-offs:

Percentage	Model Size	Performance	Inference Speed
10% (0.1)	~500 MB	Excellent	Slower
1% (0.01)	~50 MB	Excellent (-0.1%)	Fast
0.1% (0.001)	~5 MB	Good (-0.5%)	Very fast

Recommendation: Use 1% (-p 0.01) for production - excellent balance.

Dataset Arguments

Controls data loading and preprocessing.

name

string

required

Dataset type.

dataset [...] mvtec /path/to/data

Location: run_patchcore.py:334Currently only mvtec is supported. Custom datasets need implementation.

data_path

path

required

Path to dataset root directory.

dataset [...] mvtec /path/to/mvtec

Location: run_patchcore.py:335Must contain subdirectories for each category (see MVTec Setup).

-d, --subdatasets

string

required

MVTec category names to train on.

Single category

dataset [...] -d bottle mvtec /path/to/mvtec

Multiple categories

dataset [...] -d bottle -d cable -d capsule mvtec /path/to/mvtec

Location: run_patchcore.py:336Valid categories (from mvtec.py:8-24):

bottle, cable, capsule, carpet, grid
hazelnut, leather, metal_nut, pill, screw
tile, toothbrush, transistor, wood, zipper

Image Preprocessing

--resize

int

default:"256"

Initial resize dimension (before center crop).

--resize 256  # For 224x224 final images
--resize 366  # For 320x320 final images

Location: run_patchcore.py:340From MVTecDataset (mvtec.py:74-87):

Load image
transforms.Resize(resize) → Resize to (resize, resize)
transforms.CenterCrop(imagesize) → Crop to (imagesize, imagesize)

Common combinations:

Resize 256 → Crop 224
Resize 366 → Crop 320

--imagesize

int

default:"224"

Final image size after center crop (input to backbone).

--imagesize 224  # Standard
--imagesize 320  # Higher resolution

Location: run_patchcore.py:341Impact:

224: Standard ImageNet size, fastest
320: Better localization, 1.5-2x slower
Higher: Requires more GPU memory

--augment

flag

Enable data augmentation during training.

--augment  # Enable augmentation

Location: run_patchcore.py:342

Data augmentation is typically not used with PatchCore, as the method assumes consistent image distributions. Only use for experimentation.

Data Loading

--batch_size

int

default:"2"

Number of images per batch.

--batch_size 2  # Default
--batch_size 1  # Reduce for OOM issues
--batch_size 8  # Increase if memory allows

Location: run_patchcore.py:338Larger batches can speed up training but require more GPU memory.

--num_workers

int

default:"8"

Number of CPU workers for data loading.

--num_workers 8   # 8 parallel workers
--num_workers 4   # Fewer workers if CPU-constrained

Location: run_patchcore.py:339More workers = faster data loading, but uses more CPU/RAM.

--train_val_split

float

default:"1.0"

Fraction of training data to use for training (vs. validation).

--train_val_split 1.0   # Use all training data
--train_val_split 0.8   # 80% train, 20% validation

Location: run_patchcore.py:337Typically use 1.0 (all data for training, no validation split).

Configuration Examples

Here are complete, working configurations for common scenarios:

Minimal Configuration

Simplest possible training command:

python bin/run_patchcore.py results \
patch_core -b wideresnet50 -le layer2 -le layer3 \
  --pretrain_embed_dimension 1024 --target_embed_dimension 1024 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.1 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 -d bottle mvtec /path/to/mvtec

Uses all defaults, trains single category.

Production Baseline

Recommended production configuration (99.2% AUROC):

python bin/run_patchcore.py \
  --gpu 0 --seed 0 --save_patchcore_model \
  --log_group IM224_WR50_Production --log_project MVTecAD \
  results \
patch_core \
  -b wideresnet50 -le layer2 -le layer3 --faiss_on_gpu \
  --pretrain_embed_dimension 1024 --target_embed_dimension 1024 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 --batch_size 2 --num_workers 8 \
  -d bottle -d cable [...all categories...] mvtec /path/to/mvtec

Maximum Performance

Best performance configuration (99.6% AUROC):

python bin/run_patchcore.py \
  --gpu 0 --seed 40 --save_patchcore_model \
  --log_group IM320_Ensemble_Best --log_project MVTecAD \
  results \
patch_core \
  -b wideresnet101 -b resnext101 -b densenet201 \
  -le 0.layer2 -le 0.layer3 \
  -le 1.layer2 -le 1.layer3 \
  -le 2.features.denseblock2 -le 2.features.denseblock3 \
  --faiss_on_gpu \
  --pretrain_embed_dimension 1024 --target_embed_dimension 384 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 366 --imagesize 320 --batch_size 2 --num_workers 8 \
  -d bottle -d cable [...all categories...] mvtec /path/to/mvtec

Memory-Constrained

Configuration for limited GPU memory (under 11 GB):

python bin/run_patchcore.py \
  --gpu 0 --seed 0 --save_patchcore_model \
  results \
patch_core \
  -b resnet50 -le layer2 -le layer3 \
  --pretrain_embed_dimension 1024 --target_embed_dimension 512 \
  --anomaly_scorer_num_nn 1 --patchsize 3 \
sampler -p 0.01 approx_greedy_coreset \
dataset --resize 256 --imagesize 224 --batch_size 1 --num_workers 4 \
  -d bottle mvtec /path/to/mvtec

Changes:

Smaller backbone (ResNet50 vs WideResNet50)
Lower target dimension (512 vs 1024)
Batch size 1
No --faiss_on_gpu

Parameter Interactions

Memory Usage Factors

GPU memory usage is primarily determined by:

Memory Formula (approximate)

GPU_memory = (
    backbone_memory +           # 1-3 GB depending on architecture
    batch_size * imagesize^2 * channels * 4 bytes +  # Input images
    feature_map_memory +        # Depends on layers and dimensions
    faiss_index_memory          # If --faiss_on_gpu (0.5-2 GB)
)

To reduce memory:

Lower --batch_size
Smaller --imagesize
Reduce --target_embed_dimension
Remove --faiss_on_gpu
Use smaller backbone

Training Time Factors

Training time per category:

Time Formula (approximate)

time_per_category = (
    num_train_images * feature_extraction_time +  # ~0.1s per image
    num_features * sampling_percentage * sampling_time  # ~0.001s per feature
)

# Multiplied by number of backbones for ensemble

To reduce time:

Lower sampling percentage (-p 0.01 vs -p 0.1)
Use approx_greedy_coreset (not greedy_coreset)
Enable --faiss_on_gpu
Smaller backbone (ResNet50 vs ResNet101)

Accuracy vs. Efficiency

Configuration	AUROC	Speed	Memory	Use Case
WR50, 224, 10%	99.2%	Fast	8 GB	Development
WR50, 224, 1%	99.2%	Fast	8 GB	Production
WR50, 320, 1%	99.3%	Medium	11 GB	High quality
Ensemble, 224, 1%	99.3%	Slow	12 GB	Balanced
Ensemble, 320, 1%	99.6%	Slowest	16 GB	Maximum

Get Started

Core Concepts

Training

Inference

Model Zoo

Overview

Global Arguments

GPU and Device Settings

Model Saving

Logging and Output

PatchCore Arguments

Backbone Configuration

Feature Dimensions

Patch Settings

Preprocessing and Aggregation

Anomaly Scoring

FAISS Configuration

Sampler Arguments

Dataset Arguments

Image Preprocessing

Data Loading

Configuration Examples

Parameter Interactions

Memory Usage Factors

Training Time Factors

Accuracy vs. Efficiency

Next Steps

Train Single Model

Ensemble Training

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Inference

Model Zoo

​Overview

​Global Arguments

​GPU and Device Settings

​Model Saving

​Logging and Output

​PatchCore Arguments

​Backbone Configuration

​Feature Dimensions

​Patch Settings

​Preprocessing and Aggregation

​Anomaly Scoring

​FAISS Configuration

​Sampler Arguments

​Dataset Arguments

​Image Preprocessing

​Data Loading

​Configuration Examples

​Parameter Interactions

​Memory Usage Factors

​Training Time Factors

​Accuracy vs. Efficiency

​Next Steps

Train Single Model

Ensemble Training

Build docs developers (and LLMs) love

Overview

Global Arguments

GPU and Device Settings

Model Saving

Logging and Output

PatchCore Arguments

Backbone Configuration

Feature Dimensions

Patch Settings

Preprocessing and Aggregation

Anomaly Scoring

FAISS Configuration

Sampler Arguments

Dataset Arguments

Image Preprocessing

Data Loading

Configuration Examples

Parameter Interactions

Memory Usage Factors

Training Time Factors

Accuracy vs. Efficiency

Next Steps