sample module provides utilities for creating representative subsets of annotation datasets, supporting both random and stratified sampling strategies.
Main Function
sample_dataset
Sample a dataset according to specified options.
- Random uniform sampling
- Stratified sampling (category-aware)
- Category filtering
- Deterministic sampling with seed
dataset- The dataset to sample fromopts- Sampling options (strategy, size, categories, seed)
Dataset containing the sampled subset
Errors:
InvalidSampleParams- Invalid sampling parameters (e.g., both n and fraction specified)SampleFailed- Sampling failed (e.g., no images remain after filtering)
Types
SampleStrategy
Image sampling strategy.
Random: Selects images uniformly at random. Each image has equal probability of selection.Stratified: Attempts to maintain category distribution from the original dataset. Useful for preserving class balance in the subset.
CategoryMode
Category filtering behavior.
Images: Filter at the image level. If an image contains at least one annotation from the selected categories, keep the entire image (with all its annotations).Annotations: Filter at the annotation level. Only keep annotations matching the selected categories, and drop images that have no remaining annotations.
SampleOptions
Sampling configuration.
n- Exact number of images to sample (mutually exclusive withfraction)fraction- Fraction of images to sample, in range (0.0, 1.0] (mutually exclusive withn)seed- Optional random seed for reproducible samplingstrategy- Sampling strategy (RandomorStratified)categories- Optional category filter (empty = no filtering)category_mode- How to apply category filtering
- Exactly one of
norfractionmust be set (not both, not neither) - If
nis set, it must be > 0 - If
fractionis set, it must be in range (0.0, 1.0]
Validation
validate_sample_options
Validate sampling options before running.
sample_dataset.
Examples
Random Sampling (50 images)
Stratified Sampling (10% of dataset)
Category Filtering (Image-level)
Sample only images that contain “person” or “car” annotations:Category Filtering (Annotation-level)
Keep only “person” annotations and drop images with no persons:Reproducible Sampling
Use a seed for deterministic results:Related
- sample command - CLI interface for sampling
- Dataset type - IR Dataset structure