sample command creates a subset of a dataset using various sampling strategies, with optional category filtering and format conversion.
Usage
Parameters
Input path to the source dataset.Short form:
-iOutput path for the sampled dataset.Short form:
-oSource format (or auto-detect).Supported values:
auto, ir-json, coco, cvat, label-studio, tfod, yolo, vocTarget format for the output.Behavior:
- If omitted and
--fromis explicit, uses the same format as input - If omitted and
--from auto, defaults toir-json
ir-json, coco, cvat, label-studio, tfod, yolo, vocNumber of images to sample (absolute count).Short form:
-nNote: Exactly one of --n or --fraction is required.Fraction of images to sample (0.0 to 1.0).Examples:
0.1- Sample 10% of images0.5- Sample 50% of images
--n or --fraction is required.Random seed for deterministic sampling.Using the same seed with the same input will always produce the same sample. Useful for reproducible experiments.
Sampling strategy to use.Options:
random- Uniform random samplingstratified- Category-aware stratified sampling (maintains category distribution)
Comma-separated list of category names to filter on.Example:
person,car,bicycleWhen specified, only images containing at least one of these categories are considered for sampling (see --category-mode).How to handle category filtering.Options:
images- Keep whole images that contain at least one selected category (all annotations preserved)annotations- Keep only annotations of selected categories (other annotations removed)
Allow lossy format conversions that may drop information.Without this flag, the command fails if the target format cannot preserve all information from the source.
Sampling Strategies
Random Sampling (--strategy random)
Selects images uniformly at random from the dataset.
Characteristics:
- Fast and simple
- May not preserve category distribution
- Good for general-purpose sampling
Stratified Sampling (--strategy stratified)
Selects images while attempting to maintain the original category distribution.
Characteristics:
- Preserves category proportions
- Better for training/validation splits
- Ensures rare categories are represented
Category Filtering
Image Mode (--category-mode images)
Keeps entire images that contain at least one annotation of the specified categories. All annotations on those images are preserved, even if they’re not in the category list.
Use case: Training a detector for specific objects while keeping scene context.
Example:
Annotation Mode (--category-mode annotations)
Keeps only the annotations that match the specified categories. Images may end up with fewer annotations than the original.
Use case: Creating a dataset for a specific subset of classes.
Example:
With
--category-mode annotations, images that lose all annotations are still kept in the output (as images without annotations).Reproducibility
Use the--seed parameter to ensure reproducible sampling:
- Creating consistent train/validation splits
- Reproducing experiments
- Sharing sampling configurations with collaborators
Examples
Sample 20% Randomly
Sample 1000 Images with Stratified Sampling
Filter for Specific Categories
Sample and Convert Format
Create Validation Split
Category-Specific Subset with Annotation Filtering
Auto-Detect and Sample YOLO Dataset
Important Notes
IDs are preserved: Sampled images and annotations retain their original IDs from the source dataset.
All categories preserved: The output dataset includes all category definitions from the source, even if some categories have no annotations after sampling.
Output
After sampling, Panlabel prints a summary:Common Workflows
Create Train/Val/Test Splits
Sample for Quick Experimentation
Balance Rare Classes
Exit Codes
0- Sampling successful1- Error occurred (invalid parameters, lossy conversion blocked, etc.)
See Also
Convert Command
Convert without sampling
Stats Command
Analyze category distribution before sampling
Validate Command
Validate sampled output