MVTec AD Dataset Setup

About MVTec AD

The MVTec Anomaly Detection (MVTec AD) dataset is the standard benchmark for industrial anomaly detection. It contains high-resolution images of 15 different object and texture categories with various types of defects.

Dataset Statistics

15 product categories
5,354 images total (3,629 training + 1,725 testing)
Training images: only defect-free (“good”) samples
Test images: defect-free and various anomaly types
Ground truth masks for pixel-level evaluation

Download the Dataset

Visit MVTec Website

Go to the official MVTec AD dataset page:

MVTec AD Downloadhttps://www.mvtec.com/company/research/datasets/mvtec-ad

Fill out the download form to access the dataset.

Download and Extract

Download the complete dataset (approximately 4.9 GB compressed).

Extract Dataset

# Extract the downloaded archive
tar -xvf mvtec_anomaly_detection.tar.xz
# or
unzip mvtec_anomaly_detection.zip

Verify Directory Structure

Ensure the extracted folder follows the correct structure (see below).

Required Directory Structure

The dataset must be organized with the following structure for PatchCore to load it correctly:

Directory Tree

mvtec/
├── bottle/
│   ├── ground_truth/
│   │   ├── broken_large/
│   │   ├── broken_small/
│   │   ├── contamination/
│   │   └── ...
│   ├── test/
│   │   ├── good/
│   │   ├── broken_large/
│   │   ├── broken_small/
│   │   ├── contamination/
│   │   └── ...
│   └── train/
│       └── good/
├── cable/
│   ├── ground_truth/
│   ├── test/
│   └── train/
├── capsule/
├── carpet/
├── grid/
├── hazelnut/
├── leather/
├── metal_nut/
├── pill/
├── screw/
├── tile/
├── toothbrush/
├── transistor/
├── wood/
└── zipper/

The directory structure must match this layout exactly. PatchCore expects:

train/good/ for training images
test/ with subdirectories for each defect type
ground_truth/ with pixel-level masks for defects

Dataset Categories

MVTec AD includes 15 categories across objects and textures:

Object Categories
Texture Categories

Objects (rigid items with consistent appearance):

bottle - Glass bottles
cable - Power cables
capsule - Pharmaceutical capsules
hazelnut - Hazelnuts
metal_nut - Metal nuts
pill - Pharmaceutical pills
screw - Screws
toothbrush - Toothbrushes
transistor - Electronic transistors
zipper - Zippers

Textures (materials with varying patterns):

carpet - Carpet fabric
grid - Grid patterns
leather - Leather material
tile - Floor tiles
wood - Wood surface

Verify Installation

Check that your dataset is properly structured:

# Navigate to dataset directory
cd /path/to/mvtec

# Count categories (should be 15)
ls -1 | wc -l

# Check a specific category structure
ls -R bottle/

Configure Dataset Path

Once the dataset is ready, you’ll reference it in training commands:

# Set environment variable
export MVTEC_DATA=/path/to/mvtec

# Use in training commands
python bin/run_patchcore.py [...] dataset [...] mvtec $MVTEC_DATA

Dataset Loading Process

When you run training, PatchCore loads the dataset as follows:

Locate Category

For each specified category (e.g., bottle), PatchCore looks for:

{data_path}/bottle/train/good/*.png

Apply Transformations

Each image is:

Resized to --resize (default: 256)
Center-cropped to --imagesize (default: 224)
Converted to tensor
Normalized with ImageNet mean/std

Normalization

IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]

Create DataLoader

Images are batched with:

batch_size: Default 2 (configurable)
num_workers: Default 8 (parallel loading)
shuffle: False (deterministic order)

Training on Subset of Categories

You don’t have to train on all 15 categories. Specify only the ones you need:

Train Single Category

# Train only on bottle category
python bin/run_patchcore.py --gpu 0 --seed 0 results \
  patch_core -b wideresnet50 -le layer2 -le layer3 \
  --pretrain_embed_dimension 1024 --target_embed_dimension 1024 \
  --anomaly_scorer_num_nn 1 --patchsize 3 --faiss_on_gpu \
  sampler -p 0.1 approx_greedy_coreset \
  dataset --resize 256 --imagesize 224 -d bottle mvtec /path/to/mvtec

Train Multiple Categories

# Train on specific categories
python bin/run_patchcore.py --gpu 0 --seed 0 results \
  patch_core [...] sampler [...] \
  dataset --resize 256 --imagesize 224 \
    -d bottle -d cable -d capsule \
    mvtec /path/to/mvtec

Common Issues

FileNotFoundError: No such file or directory

Problem: Dataset path is incorrect or structure doesn’t match expected format.Solution:

Verify the path exists: ls /path/to/mvtec
Check category exists: ls /path/to/mvtec/bottle
Verify train folder: ls /path/to/mvtec/bottle/train/good

Empty dataset or zero training images

Problem: Training images not in correct location.Solution:

All training images must be in {category}/train/good/
Check for .png or .jpg files in that directory
Verify file permissions (readable)

Ground truth masks not found during testing

Problem: Mask files missing or in wrong location.Solution:

Masks should be in {category}/ground_truth/{defect_type}/
Mask filenames must match test image filenames
Only defect images have masks (good samples don’t need masks)

Next Steps

Train Single Model

Start training with WideResNet50 backbone

Configuration Guide

Learn about all training parameters

Get Started

Core Concepts

Training

Inference

Model Zoo

About MVTec AD

Download the Dataset

Required Directory Structure

Dataset Categories

Verify Installation

Configure Dataset Path

Dataset Loading Process

Training on Subset of Categories

Common Issues

Next Steps

Train Single Model

Configuration Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Inference

Model Zoo

​About MVTec AD

​Download the Dataset

​Required Directory Structure

​Dataset Categories

​Verify Installation

​Configure Dataset Path

​Dataset Loading Process

​Training on Subset of Categories

​Common Issues

​Next Steps

Train Single Model

Configuration Guide

Build docs developers (and LLMs) love

About MVTec AD

Download the Dataset

Required Directory Structure

Dataset Categories

Verify Installation

Configure Dataset Path

Dataset Loading Process

Training on Subset of Categories

Common Issues

Next Steps