Sample Command

The sample command creates a subset of a dataset using various sampling strategies, with optional category filtering and format conversion.

Usage

panlabel sample -i <INPUT> -o <OUTPUT> [OPTIONS]

Parameters

--input

path

required

Input path to the source dataset.Short form: -i

--output

path

required

Output path for the sampled dataset.Short form: -o

--from

string

default:"auto"

Source format (or auto-detect).Supported values: auto, ir-json, coco, cvat, label-studio, tfod, yolo, voc

--to

string

Target format for the output.Behavior:

If omitted and --from is explicit, uses the same format as input
If omitted and --from auto, defaults to ir-json

Supported values: ir-json, coco, cvat, label-studio, tfod, yolo, voc

--n

integer

Number of images to sample (absolute count).Short form: -nNote: Exactly one of --n or --fraction is required.

--fraction

float

Fraction of images to sample (0.0 to 1.0).Examples:

0.1 - Sample 10% of images
0.5 - Sample 50% of images

Note: Exactly one of --n or --fraction is required.

--seed

integer

Random seed for deterministic sampling.Using the same seed with the same input will always produce the same sample. Useful for reproducible experiments.

--strategy

string

default:"random"

Sampling strategy to use.Options:

random - Uniform random sampling
stratified - Category-aware stratified sampling (maintains category distribution)

--categories

string

Comma-separated list of category names to filter on.Example: person,car,bicycleWhen specified, only images containing at least one of these categories are considered for sampling (see --category-mode).

--category-mode

string

default:"images"

How to handle category filtering.Options:

images - Keep whole images that contain at least one selected category (all annotations preserved)
annotations - Keep only annotations of selected categories (other annotations removed)

--allow-lossy

flag

default:"false"

Allow lossy format conversions that may drop information.Without this flag, the command fails if the target format cannot preserve all information from the source.

Sampling Strategies

Random Sampling (`--strategy random`)

Selects images uniformly at random from the dataset. Characteristics:

Fast and simple
May not preserve category distribution
Good for general-purpose sampling

Example:

panlabel sample -i dataset.json -o sample.json -n 100 --strategy random

Stratified Sampling (`--strategy stratified`)

Selects images while attempting to maintain the original category distribution. Characteristics:

Preserves category proportions
Better for training/validation splits
Ensures rare categories are represented

Example:

panlabel sample -i dataset.json -o sample.json --fraction 0.2 --strategy stratified

Category Filtering

Image Mode (`--category-mode images`)

Keeps entire images that contain at least one annotation of the specified categories. All annotations on those images are preserved, even if they’re not in the category list. Use case: Training a detector for specific objects while keeping scene context. Example:

# Keep all images with people or cars (and all their annotations)
panlabel sample -i dataset.json -o filtered.json \
  --categories person,car \
  --category-mode images \
  -n 500

Annotation Mode (`--category-mode annotations`)

Keeps only the annotations that match the specified categories. Images may end up with fewer annotations than the original. Use case: Creating a dataset for a specific subset of classes. Example:

# Keep only person and car annotations
panlabel sample -i dataset.json -o filtered.json \
  --categories person,car \
  --category-mode annotations \
  --fraction 1.0

With --category-mode annotations, images that lose all annotations are still kept in the output (as images without annotations).

Reproducibility

Use the --seed parameter to ensure reproducible sampling:

# Always produces the same sample
panlabel sample -i dataset.json -o sample.json -n 100 --seed 42

This is essential for:

Creating consistent train/validation splits
Reproducing experiments
Sharing sampling configurations with collaborators

Examples

Sample 20% Randomly

panlabel sample -i large_dataset.json -o small_dataset.json --fraction 0.2

Sample 1000 Images with Stratified Sampling

panlabel sample -i training_data.json -o sample.json \
  -n 1000 \
  --strategy stratified \
  --seed 42

Filter for Specific Categories

panlabel sample -i dataset.json -o people_only.json \
  --categories person,pedestrian \
  --category-mode images \
  --fraction 1.0

Sample and Convert Format

panlabel sample -i coco_dataset.json -o yolo_sample/ \
  --from coco \
  --to yolo \
  -n 500 \
  --allow-lossy

Create Validation Split

# Stratified 80/20 split
panlabel sample -i full_dataset.json -o train.json \
  --fraction 0.8 \
  --strategy stratified \
  --seed 42

panlabel sample -i full_dataset.json -o val.json \
  --fraction 0.2 \
  --strategy stratified \
  --seed 43  # Different seed for non-overlapping sample

Category-Specific Subset with Annotation Filtering

panlabel sample -i multi_class.json -o binary.json \
  --categories positive_class,negative_class \
  --category-mode annotations \
  --fraction 1.0

Auto-Detect and Sample YOLO Dataset

panlabel sample -i /data/yolo_full/ -o /data/yolo_sample/ \
  --from auto \
  -n 200 \
  --seed 42

Important Notes

IDs are preserved: Sampled images and annotations retain their original IDs from the source dataset.

All categories preserved: The output dataset includes all category definitions from the source, even if some categories have no annotations after sampling.

When using --fraction with small datasets, the actual number of images may be less than expected due to rounding. Use -n for precise control.

Output

After sampling, Panlabel prints a summary:

Sampled 1000 images -> 200 images: dataset.json (coco) -> sample.json (coco)

Conversion Report:
  ✓ Lossless conversion
  Images: 200
  Annotations: 1,456
  Categories: 8

Common Workflows

Create Train/Val/Test Splits

#!/bin/bash
SEED=42

# 70% train
panlabel sample -i full.json -o train.json \
  --fraction 0.7 --strategy stratified --seed $SEED

# 20% val
panlabel sample -i full.json -o val.json \
  --fraction 0.2 --strategy stratified --seed $((SEED + 1))

# 10% test
panlabel sample -i full.json -o test.json \
  --fraction 0.1 --strategy stratified --seed $((SEED + 2))

Sample for Quick Experimentation

# Quick 100-image sample for testing code
panlabel sample -i huge_dataset.json -o quick_test.json -n 100

Balance Rare Classes

# First, get all images with rare class
panlabel sample -i dataset.json -o rare_class_images.json \
  --categories rare_class \
  --category-mode images \
  --fraction 1.0

# Then sample from that filtered set
panlabel sample -i rare_class_images.json -o balanced_sample.json \
  -n 500 --strategy random

Exit Codes

0 - Sampling successful
1 - Error occurred (invalid parameters, lossy conversion blocked, etc.)

Convert Command

Convert without sampling

Stats Command

Analyze category distribution before sampling

Validate Command

Validate sampled output

Get Started

CLI Commands

Guides

Format Reference

Advanced

Usage

Parameters

Sampling Strategies

Random Sampling (`--strategy random`)

Stratified Sampling (`--strategy stratified`)

Category Filtering

Image Mode (`--category-mode images`)

Annotation Mode (`--category-mode annotations`)

Reproducibility

Examples

Sample 20% Randomly

Sample 1000 Images with Stratified Sampling

Filter for Specific Categories

Sample and Convert Format

Create Validation Split

Category-Specific Subset with Annotation Filtering

Auto-Detect and Sample YOLO Dataset

Important Notes

Output

Common Workflows

Create Train/Val/Test Splits

Sample for Quick Experimentation

Balance Rare Classes

Exit Codes

See Also

Convert Command

Stats Command

Validate Command

Build docs developers (and LLMs) love

Get Started

CLI Commands

Guides

Format Reference

Advanced

​Usage

​Parameters

​Sampling Strategies

​Random Sampling (--strategy random)

​Stratified Sampling (--strategy stratified)

​Category Filtering

​Image Mode (--category-mode images)

​Annotation Mode (--category-mode annotations)

​Reproducibility

​Examples

​Sample 20% Randomly

​Sample 1000 Images with Stratified Sampling

​Filter for Specific Categories

​Sample and Convert Format

​Create Validation Split

​Category-Specific Subset with Annotation Filtering

​Auto-Detect and Sample YOLO Dataset

​Important Notes

​Output

​Common Workflows

​Create Train/Val/Test Splits

​Sample for Quick Experimentation

​Balance Rare Classes

​Exit Codes

​See Also

Convert Command

Stats Command

Validate Command

Build docs developers (and LLMs) love

Usage

Parameters

Sampling Strategies

Random Sampling (`--strategy random`)

Stratified Sampling (`--strategy stratified`)

Category Filtering

Image Mode (`--category-mode images`)

Annotation Mode (`--category-mode annotations`)

Reproducibility

Examples

Sample 20% Randomly

Sample 1000 Images with Stratified Sampling

Filter for Specific Categories

Sample and Convert Format

Create Validation Split

Category-Specific Subset with Annotation Filtering

Auto-Detect and Sample YOLO Dataset

Important Notes

Output

Common Workflows

Create Train/Val/Test Splits

Sample for Quick Experimentation

Balance Rare Classes

Exit Codes

See Also