Coreset sampling is a critical component of PatchCore that reduces the memory bank size from potentially millions of patch features to a manageable subset, while preserving the representation quality. This enables efficient storage and fast inference without sacrificing detection performance.
Why Coreset Sampling?
Without subsampling, the memory requirements would be prohibitive:
Example: For a dataset with 200 training images at 224×224 resolution:
Patches per image: 56 × 56 = 3,136
Total patches: 200 × 3,136 = 627,200 patches
With 1024-dim features: ~2.4 GB of memory
After 10% coreset sampling:
Sampled patches: ~62,720 patches
Memory: ~240 MB (10× reduction)
Sampling Strategies
PatchCore provides three sampling strategies:
Identity Sampler No sampling - uses all features
Random Sampler Random subset selection
Greedy Coreset Intelligent diversity-based selection
IdentitySampler
The simplest sampler that returns all features unchanged:
class IdentitySampler :
def run (
self , features : Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]:
return features
Use IdentitySampler for small datasets or when memory is not a constraint.
RandomSampler
Randomly selects a percentage of features:
class RandomSampler ( BaseSampler ):
def __init__ ( self , percentage : float ):
super (). __init__ (percentage)
def run (
self , features : Union[torch.Tensor, np.ndarray]
) -> Union[torch.Tensor, np.ndarray]:
"""Randomly samples input feature collection.
Args:
features: [N x D]
"""
num_random_samples = int ( len (features) * self .percentage)
subset_indices = np.random.choice(
len (features), num_random_samples, replace = False
)
subset_indices = np.array(subset_indices)
return features[subset_indices]
Random sampling is fast but may miss important feature diversity. Use it as a baseline comparison.
GreedyCoresetSampler
The recommended sampling strategy that selects a diverse subset using a greedy farthest-point algorithm:
class GreedyCoresetSampler ( BaseSampler ):
def __init__ (
self ,
percentage : float ,
device : torch.device,
dimension_to_project_features_to = 128 ,
):
"""Greedy Coreset sampling base class."""
super (). __init__ (percentage)
self .device = device
self .dimension_to_project_features_to = dimension_to_project_features_to
Dimensionality Reduction for Speed
To accelerate distance computations, features are projected to a lower dimension:
def _reduce_features ( self , features ):
if features.shape[ 1 ] == self .dimension_to_project_features_to:
return features
mapper = torch.nn.Linear(
features.shape[ 1 ], self .dimension_to_project_features_to, bias = False
)
_ = mapper.to( self .device)
features = features.to( self .device)
return mapper(features)
The random projection is only used for distance computation during sampling. The original high-dimensional features are returned for the memory bank.
Core Algorithm
The greedy coreset selection algorithm:
def _compute_greedy_coreset_indices ( self , features : torch.Tensor) -> np.ndarray:
"""Runs iterative greedy coreset selection.
Args:
features: [NxD] input feature bank to sample.
"""
distance_matrix = self ._compute_batchwise_differences(features, features)
coreset_anchor_distances = torch.norm(distance_matrix, dim = 1 )
coreset_indices = []
num_coreset_samples = int ( len (features) * self .percentage)
for _ in range (num_coreset_samples):
select_idx = torch.argmax(coreset_anchor_distances).item()
coreset_indices.append(select_idx)
coreset_select_distance = distance_matrix[
:, select_idx : select_idx + 1
]
coreset_anchor_distances = torch.cat(
[coreset_anchor_distances.unsqueeze( - 1 ), coreset_select_distance], dim = 1
)
coreset_anchor_distances = torch.min(coreset_anchor_distances, dim = 1 ).values
return np.array(coreset_indices)
Algorithm Breakdown
Initialize
Compute full pairwise distance matrix between all features
Select Farthest Point
Choose the point with maximum distance to already selected points (initially, maximum norm)
Update Distances
For each remaining point, compute its distance to the newly selected point and update minimum distances
Repeat
Continue until desired number of samples is reached
Distance Computation
Efficient batched Euclidean distance calculation:
@ staticmethod
def _compute_batchwise_differences (
matrix_a : torch.Tensor, matrix_b : torch.Tensor
) -> torch.Tensor:
"""Computes batchwise Euclidean distances using PyTorch."""
a_times_a = matrix_a.unsqueeze( 1 ).bmm(matrix_a.unsqueeze( 2 )).reshape( - 1 , 1 )
b_times_b = matrix_b.unsqueeze( 1 ).bmm(matrix_b.unsqueeze( 2 )).reshape( 1 , - 1 )
a_times_b = matrix_a.mm(matrix_b.T)
return ( - 2 * a_times_b + a_times_a + b_times_b).clamp( 0 , None ).sqrt()
This uses the identity: ||a - b||² = ||a||² + ||b||² - 2⟨a,b⟩ for efficient computation
ApproximateGreedyCoresetSampler
For very large feature sets, the approximate version trades some optimality for memory efficiency:
class ApproximateGreedyCoresetSampler ( GreedyCoresetSampler ):
def __init__ (
self ,
percentage : float ,
device : torch.device,
number_of_starting_points : int = 10 ,
dimension_to_project_features_to : int = 128 ,
):
"""Approximate Greedy Coreset sampling base class."""
self .number_of_starting_points = number_of_starting_points
super (). __init__ (percentage, device, dimension_to_project_features_to)
Approximate Algorithm
def _compute_greedy_coreset_indices ( self , features : torch.Tensor) -> np.ndarray:
"""Runs approximate iterative greedy coreset selection.
This greedy coreset implementation does not require computation of the
full N x N distance matrix and thus requires a lot less memory, however
at the cost of increased sampling times.
Args:
features: [NxD] input feature bank to sample.
"""
number_of_starting_points = np.clip(
self .number_of_starting_points, None , len (features)
)
start_points = np.random.choice(
len (features), number_of_starting_points, replace = False
).tolist()
approximate_distance_matrix = self ._compute_batchwise_differences(
features, features[start_points]
)
approximate_coreset_anchor_distances = torch.mean(
approximate_distance_matrix, axis =- 1
).reshape( - 1 , 1 )
coreset_indices = []
num_coreset_samples = int ( len (features) * self .percentage)
with torch.no_grad():
for _ in tqdm.tqdm( range (num_coreset_samples), desc = "Subsampling..." ):
select_idx = torch.argmax(approximate_coreset_anchor_distances).item()
coreset_indices.append(select_idx)
coreset_select_distance = self ._compute_batchwise_differences(
features, features[select_idx : select_idx + 1 ]
)
approximate_coreset_anchor_distances = torch.cat(
[approximate_coreset_anchor_distances, coreset_select_distance],
dim =- 1 ,
)
approximate_coreset_anchor_distances = torch.min(
approximate_coreset_anchor_distances, dim = 1
).values.reshape( - 1 , 1 )
return np.array(coreset_indices)
Key Differences
Instead of computing the full N×N distance matrix, it:
Starts with distances to a small random subset
Computes distances incrementally as new points are selected
Memory: O(N × k) instead of O(N²) where k = number of selected samples
GreedyCoresetSampler: O(N²) distance computation + O(N × k) selection
ApproximateGreedyCoresetSampler: O(N × k²) for incremental distance updates
For small k (e.g., 10% of N), approximate version is slower but uses much less memory.
The approximate version uses mean distances to random starting points for initialization, which may not find the globally optimal coreset but works well in practice.
Usage in PatchCore
Coreset sampling is applied during the training phase:
def _fill_memory_bank ( self , input_data ):
"""Computes and sets the support features for SPADE."""
_ = self .forward_modules.eval()
def _image_to_features ( input_image ):
with torch.no_grad():
input_image = input_image.to(torch.float).to( self .device)
return self ._embed(input_image)
features = []
with tqdm.tqdm(
input_data, desc = "Computing support features..." , position = 1 , leave = False
) as data_iterator:
for image in data_iterator:
if isinstance (image, dict ):
image = image[ "image" ]
features.append(_image_to_features(image))
features = np.concatenate(features, axis = 0 )
features = self .featuresampler.run(features) # <- Coreset sampling happens here
self .anomaly_scorer.fit( detection_features = [features])
Configuration Examples
Standard (Recommended)
High Performance
No Sampling
Random Baseline
python bin/run_patchcore.py \
patch_core ... \
sampler -p 0.1 approx_greedy_coreset \
dataset ...
Percentage: 10% (0.1)
Method: Approximate greedy coreset
Best for: Most use cases with >50k patches
python bin/run_patchcore.py \
patch_core ... \
sampler -p 0.01 approx_greedy_coreset \
dataset ...
Percentage: 1% (0.01)
Method: Approximate greedy coreset
Best for: Very large datasets, fastest inference
featuresampler = patchcore.sampler.IdentitySampler()
Percentage: 100%
Method: Identity (no sampling)
Best for: Small datasets (under 10k patches)
featuresampler = patchcore.sampler.RandomSampler( percentage = 0.1 )
Percentage: 10%
Method: Random sampling
Best for: Baseline comparisons
Sampling % Memory Inference Speed AUROC (typical) 100% (no sampling) 2.4 GB 1× 99.2% 10% (greedy) 240 MB 8× 99.1% 1% (greedy) 24 MB 50× 98.5% 10% (random) 240 MB 8× 97.8%
Finding: Greedy coreset at 10% typically loses less than 0.1% AUROC while providing 10× speedup!
Visualization: Coreset Coverage
The greedy algorithm ensures good coverage of the feature space:
Feature Space (t-SNE projection):
● ○ ○ ○ ● ○: All training patches
○ ○ ○ ○ ○ ●: Selected coreset samples
○ ● ○○ ● ○ ○
○ ○ ○ ● ○ ○ Note: Coreset samples are
○ ○ ○ ○ ○ ● spread across the space
○ ● ○ ○ ○ ○
○ ○ ● ○ ○ ○
○ ○ ○ ●
Implementation Tips
GPU Acceleration: For datasets with >100k patches, ensure coreset sampling runs on GPU by passing the correct device:sampler = ApproximateGreedyCoresetSampler(
percentage = 0.1 ,
device = torch.device( 'cuda' ), # Run on GPU
dimension_to_project_features_to = 128
)
Percentage Validation: The BaseSampler enforces 0 < percentage < 1:if not 0 < percentage < 1 :
raise ValueError ( "Percentage value not in (0, 1)." )
Use decimals (0.1 for 10%), not percentages (10).
Next Steps
Anomaly Scoring Learn how the coreset memory bank is used for scoring
PatchCore Algorithm See how coreset fits into the full pipeline