Overview
Samplers reduce the size of the feature memory bank by selecting a representative subset of features from the training data. This improves inference speed and memory efficiency while maintaining detection accuracy.
IdentitySampler
A no-op sampler that returns all features without any subsampling. Use this when you want to keep all training features in the memory bank.
Methods
run
def run(
self,
features: Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]
Returns the input features unchanged.
features
torch.Tensor or np.ndarray
required
Input features with shape [N, D] where N is the number of samples and D is the feature dimension.
Returns
features
torch.Tensor or np.ndarray
The same features that were input, with no modification.
Example
from patchcore.sampler import IdentitySampler
import numpy as np
sampler = IdentitySampler()
features = np.random.randn(1000, 512)
sampled_features = sampler.run(features)
# sampled_features.shape == (1000, 512) - no change
BaseSampler
class BaseSampler(abc.ABC)
Abstract base class for all sampling strategies that perform subsampling.
Constructor
def __init__(self, percentage: float)
Percentage of features to keep, must be in the range (0, 1). For example, 0.1 means keep 10% of features.
Methods
run
@abc.abstractmethod
def run(
self,
features: Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]
Abstract method that must be implemented by subclasses.
GreedyCoresetSampler
class GreedyCoresetSampler(BaseSampler)
Greedy coreset subsampling algorithm that iteratively selects the most representative features. This produces high-quality subsamples but requires computing the full N×N distance matrix.
Constructor
def __init__(
self,
percentage: float,
device: torch.device,
dimension_to_project_features_to: int = 128
)
Percentage of features to keep (0 < percentage < 1). For example, 0.1 keeps 10% of features.
Device on which to perform coreset computation (CPU or CUDA).
dimension_to_project_features_to
Features are projected to this dimension before computing distances to speed up computation.
Methods
run
def run(
self,
features: Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]
Subsamples features using the greedy coreset algorithm.
features
torch.Tensor or np.ndarray
required
Input features with shape [N, D].
Returns
sampled_features
torch.Tensor or np.ndarray
Subsampled features with shape [N*percentage, D].
Example
import torch
from patchcore.sampler import GreedyCoresetSampler
device = torch.device("cuda")
sampler = GreedyCoresetSampler(
percentage=0.1, # Keep 10% of features
device=device,
dimension_to_project_features_to=128
)
# Sample from 10,000 features
features = torch.randn(10000, 1024).to(device)
sampled_features = sampler.run(features)
print(sampled_features.shape) # torch.Size([1000, 1024])
Memory Usage: This sampler computes a full N×N distance matrix, which requires O(N²) memory. For very large feature sets (>50,000 samples), consider using ApproximateGreedyCoresetSampler instead.
ApproximateGreedyCoresetSampler
class ApproximateGreedyCoresetSampler(GreedyCoresetSampler)
Approximate greedy coreset subsampling that avoids computing the full N×N distance matrix. This is more memory-efficient but slower than the exact version.
Constructor
def __init__(
self,
percentage: float,
device: torch.device,
number_of_starting_points: int = 10,
dimension_to_project_features_to: int = 128
)
Percentage of features to keep (0 < percentage < 1).
Device on which to perform coreset computation.
number_of_starting_points
Number of random starting points to use for distance approximation. Higher values improve accuracy but increase computation time.
dimension_to_project_features_to
Target dimension for feature projection before distance computation.
Methods
run
def run(
self,
features: Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]
Subsamples features using approximate greedy coreset algorithm.
features
torch.Tensor or np.ndarray
required
Input features with shape [N, D].
Returns
sampled_features
torch.Tensor or np.ndarray
Subsampled features with shape [N*percentage, D].
Example
import torch
from patchcore.sampler import ApproximateGreedyCoresetSampler
device = torch.device("cuda")
sampler = ApproximateGreedyCoresetSampler(
percentage=0.1,
device=device,
number_of_starting_points=10,
dimension_to_project_features_to=128
)
# Sample from a large feature set
features = torch.randn(100000, 1024).to(device)
sampled_features = sampler.run(features)
print(sampled_features.shape) # torch.Size([10000, 1024])
Performance Trade-off: This sampler uses less memory than GreedyCoresetSampler but takes longer to compute. It’s ideal for datasets with >50,000 training images where memory is a constraint.
RandomSampler
class RandomSampler(BaseSampler)
Randomly samples a subset of features. This is the fastest sampling method but may not preserve the feature distribution as well as coreset methods.
Constructor
def __init__(self, percentage: float)
Percentage of features to keep (0 < percentage < 1).
Methods
run
def run(
self,
features: Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]
Randomly samples features without replacement.
features
torch.Tensor or np.ndarray
required
Input features with shape [N, D].
Returns
sampled_features
torch.Tensor or np.ndarray
Randomly sampled features with shape [N*percentage, D].
Example
import numpy as np
from patchcore.sampler import RandomSampler
sampler = RandomSampler(percentage=0.1)
features = np.random.randn(10000, 1024)
sampled_features = sampler.run(features)
print(sampled_features.shape) # (1000, 1024)
Comparison
| Sampler | Speed | Memory | Quality | Use Case |
|---|
| IdentitySampler | Instant | High | Perfect | Small datasets, high accuracy requirements |
| RandomSampler | Fast | Low | Good | Quick experiments, large datasets |
| GreedyCoresetSampler | Medium | High | Excellent | Medium datasets (under 50k samples) |
| ApproximateGreedyCoresetSampler | Slow | Low | Excellent | Large datasets (over 50k samples) |
Usage with PatchCore
import torch
from patchcore.patchcore import PatchCore
from patchcore.sampler import ApproximateGreedyCoresetSampler
import patchcore.backbones
device = torch.device("cuda")
model = PatchCore(device)
# Create sampler to keep 10% of features
sampler = ApproximateGreedyCoresetSampler(
percentage=0.1,
device=device
)
# Load model with sampler
backbone = patchcore.backbones.load("wideresnet50")
model.load(
backbone=backbone,
layers_to_extract_from=["layer2", "layer3"],
device=device,
input_shape=(3, 224, 224),
pretrain_embed_dimension=1024,
target_embed_dimension=1024,
featuresampler=sampler # Use coreset sampler
)
# Train as usual
model.fit(train_loader)
Recommendation: For most industrial anomaly detection tasks, use ApproximateGreedyCoresetSampler with percentage=0.1 (10%). This provides the best balance of speed, memory, and accuracy.