Skip to main content
Coreset sampling is a critical component of PatchCore that reduces the memory bank size from potentially millions of patch features to a manageable subset, while preserving the representation quality. This enables efficient storage and fast inference without sacrificing detection performance.

Why Coreset Sampling?

Without subsampling, the memory requirements would be prohibitive:
Example: For a dataset with 200 training images at 224×224 resolution:
  • Patches per image: 56 × 56 = 3,136
  • Total patches: 200 × 3,136 = 627,200 patches
  • With 1024-dim features: ~2.4 GB of memory
After 10% coreset sampling:
  • Sampled patches: ~62,720 patches
  • Memory: ~240 MB (10× reduction)

Sampling Strategies

PatchCore provides three sampling strategies:

Identity Sampler

No sampling - uses all features

Random Sampler

Random subset selection

Greedy Coreset

Intelligent diversity-based selection

IdentitySampler

The simplest sampler that returns all features unchanged:
sampler.py
class IdentitySampler:
    def run(
        self, features: Union[torch.Tensor, np.ndarray]
    ) -> Union[torch.Tensor, np.ndarray]:
        return features
Use IdentitySampler for small datasets or when memory is not a constraint.

RandomSampler

Randomly selects a percentage of features:
sampler.py
class RandomSampler(BaseSampler):
    def __init__(self, percentage: float):
        super().__init__(percentage)
    
    def run(
        self, features: Union[torch.Tensor, np.ndarray]
    ) -> Union[torch.Tensor, np.ndarray]:
        """Randomly samples input feature collection.
        
        Args:
            features: [N x D]
        """
        num_random_samples = int(len(features) * self.percentage)
        subset_indices = np.random.choice(
            len(features), num_random_samples, replace=False
        )
        subset_indices = np.array(subset_indices)
        return features[subset_indices]
Random sampling is fast but may miss important feature diversity. Use it as a baseline comparison.

GreedyCoresetSampler

The recommended sampling strategy that selects a diverse subset using a greedy farthest-point algorithm:
sampler.py
class GreedyCoresetSampler(BaseSampler):
    def __init__(
        self,
        percentage: float,
        device: torch.device,
        dimension_to_project_features_to=128,
    ):
        """Greedy Coreset sampling base class."""
        super().__init__(percentage)
        
        self.device = device
        self.dimension_to_project_features_to = dimension_to_project_features_to

Dimensionality Reduction for Speed

To accelerate distance computations, features are projected to a lower dimension:
sampler.py
def _reduce_features(self, features):
    if features.shape[1] == self.dimension_to_project_features_to:
        return features
    mapper = torch.nn.Linear(
        features.shape[1], self.dimension_to_project_features_to, bias=False
    )
    _ = mapper.to(self.device)
    features = features.to(self.device)
    return mapper(features)
The random projection is only used for distance computation during sampling. The original high-dimensional features are returned for the memory bank.

Core Algorithm

The greedy coreset selection algorithm:
sampler.py
def _compute_greedy_coreset_indices(self, features: torch.Tensor) -> np.ndarray:
    """Runs iterative greedy coreset selection.
    
    Args:
        features: [NxD] input feature bank to sample.
    """
    distance_matrix = self._compute_batchwise_differences(features, features)
    coreset_anchor_distances = torch.norm(distance_matrix, dim=1)
    
    coreset_indices = []
    num_coreset_samples = int(len(features) * self.percentage)
    
    for _ in range(num_coreset_samples):
        select_idx = torch.argmax(coreset_anchor_distances).item()
        coreset_indices.append(select_idx)
        
        coreset_select_distance = distance_matrix[
            :, select_idx : select_idx + 1
        ]
        coreset_anchor_distances = torch.cat(
            [coreset_anchor_distances.unsqueeze(-1), coreset_select_distance], dim=1
        )
        coreset_anchor_distances = torch.min(coreset_anchor_distances, dim=1).values
    
    return np.array(coreset_indices)

Algorithm Breakdown

1

Initialize

Compute full pairwise distance matrix between all features
2

Select Farthest Point

Choose the point with maximum distance to already selected points (initially, maximum norm)
3

Update Distances

For each remaining point, compute its distance to the newly selected point and update minimum distances
4

Repeat

Continue until desired number of samples is reached

Distance Computation

Efficient batched Euclidean distance calculation:
sampler.py
@staticmethod
def _compute_batchwise_differences(
    matrix_a: torch.Tensor, matrix_b: torch.Tensor
) -> torch.Tensor:
    """Computes batchwise Euclidean distances using PyTorch."""
    a_times_a = matrix_a.unsqueeze(1).bmm(matrix_a.unsqueeze(2)).reshape(-1, 1)
    b_times_b = matrix_b.unsqueeze(1).bmm(matrix_b.unsqueeze(2)).reshape(1, -1)
    a_times_b = matrix_a.mm(matrix_b.T)
    
    return (-2 * a_times_b + a_times_a + b_times_b).clamp(0, None).sqrt()
This uses the identity: ||a - b||² = ||a||² + ||b||² - 2⟨a,b⟩ for efficient computation

ApproximateGreedyCoresetSampler

For very large feature sets, the approximate version trades some optimality for memory efficiency:
sampler.py
class ApproximateGreedyCoresetSampler(GreedyCoresetSampler):
    def __init__(
        self,
        percentage: float,
        device: torch.device,
        number_of_starting_points: int = 10,
        dimension_to_project_features_to: int = 128,
    ):
        """Approximate Greedy Coreset sampling base class."""
        self.number_of_starting_points = number_of_starting_points
        super().__init__(percentage, device, dimension_to_project_features_to)

Approximate Algorithm

sampler.py
def _compute_greedy_coreset_indices(self, features: torch.Tensor) -> np.ndarray:
    """Runs approximate iterative greedy coreset selection.
    
    This greedy coreset implementation does not require computation of the
    full N x N distance matrix and thus requires a lot less memory, however
    at the cost of increased sampling times.
    
    Args:
        features: [NxD] input feature bank to sample.
    """
    number_of_starting_points = np.clip(
        self.number_of_starting_points, None, len(features)
    )
    start_points = np.random.choice(
        len(features), number_of_starting_points, replace=False
    ).tolist()
    
    approximate_distance_matrix = self._compute_batchwise_differences(
        features, features[start_points]
    )
    approximate_coreset_anchor_distances = torch.mean(
        approximate_distance_matrix, axis=-1
    ).reshape(-1, 1)
    coreset_indices = []
    num_coreset_samples = int(len(features) * self.percentage)
    
    with torch.no_grad():
        for _ in tqdm.tqdm(range(num_coreset_samples), desc="Subsampling..."):
            select_idx = torch.argmax(approximate_coreset_anchor_distances).item()
            coreset_indices.append(select_idx)
            coreset_select_distance = self._compute_batchwise_differences(
                features, features[select_idx : select_idx + 1]
            )
            approximate_coreset_anchor_distances = torch.cat(
                [approximate_coreset_anchor_distances, coreset_select_distance],
                dim=-1,
            )
            approximate_coreset_anchor_distances = torch.min(
                approximate_coreset_anchor_distances, dim=1
            ).values.reshape(-1, 1)
    
    return np.array(coreset_indices)

Key Differences

Instead of computing the full N×N distance matrix, it:
  1. Starts with distances to a small random subset
  2. Computes distances incrementally as new points are selected
  3. Memory: O(N × k) instead of O(N²) where k = number of selected samples
  • GreedyCoresetSampler: O(N²) distance computation + O(N × k) selection
  • ApproximateGreedyCoresetSampler: O(N × k²) for incremental distance updates
For small k (e.g., 10% of N), approximate version is slower but uses much less memory.
The approximate version uses mean distances to random starting points for initialization, which may not find the globally optimal coreset but works well in practice.

Usage in PatchCore

Coreset sampling is applied during the training phase:
patchcore.py
def _fill_memory_bank(self, input_data):
    """Computes and sets the support features for SPADE."""
    _ = self.forward_modules.eval()
    
    def _image_to_features(input_image):
        with torch.no_grad():
            input_image = input_image.to(torch.float).to(self.device)
            return self._embed(input_image)
    
    features = []
    with tqdm.tqdm(
        input_data, desc="Computing support features...", position=1, leave=False
    ) as data_iterator:
        for image in data_iterator:
            if isinstance(image, dict):
                image = image["image"]
            features.append(_image_to_features(image))
    
    features = np.concatenate(features, axis=0)
    features = self.featuresampler.run(features)  # <- Coreset sampling happens here
    
    self.anomaly_scorer.fit(detection_features=[features])

Configuration Examples

Performance vs. Sampling Rate

Sampling %MemoryInference SpeedAUROC (typical)
100% (no sampling)2.4 GB99.2%
10% (greedy)240 MB99.1%
1% (greedy)24 MB50×98.5%
10% (random)240 MB97.8%
Finding: Greedy coreset at 10% typically loses less than 0.1% AUROC while providing 10× speedup!

Visualization: Coreset Coverage

The greedy algorithm ensures good coverage of the feature space:
Feature Space (t-SNE projection):

        ●  ○ ○     ○  ●        ○: All training patches
     ○    ○   ○  ○     ○      ●: Selected coreset samples
   ○  ●    ○○    ●  ○   ○
  ○    ○  ○  ● ○     ○        Note: Coreset samples are
    ○   ○ ○  ○   ○  ●         spread across the space
  ○  ●   ○    ○    ○   ○
    ○   ○  ●    ○  ○  ○
      ○     ○  ○    ●

Implementation Tips

GPU Acceleration: For datasets with >100k patches, ensure coreset sampling runs on GPU by passing the correct device:
sampler = ApproximateGreedyCoresetSampler(
    percentage=0.1,
    device=torch.device('cuda'),  # Run on GPU
    dimension_to_project_features_to=128
)
Percentage Validation: The BaseSampler enforces 0 < percentage < 1:
if not 0 < percentage < 1:
    raise ValueError("Percentage value not in (0, 1).")
Use decimals (0.1 for 10%), not percentages (10).

Next Steps

Anomaly Scoring

Learn how the coreset memory bank is used for scoring

PatchCore Algorithm

See how coreset fits into the full pipeline

Build docs developers (and LLMs) love