Skip to main content
This guide will walk you through the most common Panlabel workflows. You’ll learn how to convert datasets, validate them, generate statistics, and more.
Make sure you have installed Panlabel before following this guide.

Basic Conversion

1

Convert COCO to YOLO

The most common use case — converting COCO annotations to YOLO format:
panlabel convert --from auto --to yolo -i annotations.json -o ./yolo_out --allow-lossy
The --from auto flag automatically detects the input format. The --allow-lossy flag is required when converting to formats that don’t preserve all data.
Output:
✓ Detected format: coco
✓ Loaded 1000 images, 5000 annotations
⚠ Conversion is lossy:
  - Segmentation masks will be dropped
  - Image metadata will be lost
✓ Wrote YOLO dataset to ./yolo_out
2

Convert YOLO to COCO

Going the other direction:
panlabel convert -f yolo -t coco -i ./my_dataset -o coco_output.json
Output:
✓ Loaded YOLO dataset from ./my_dataset
✓ Found 800 images, 3500 annotations
✓ Wrote COCO JSON to coco_output.json
3

Try other format conversions

Panlabel supports many formats:
panlabel convert -f voc -t coco -i ./voc_dataset -o coco_output.json

Validate Your Dataset

1

Check for common problems

Before training a model, validate your dataset:
panlabel validate --format coco annotations.json
Output:
✓ Validating COCO dataset...
✓ Found 1000 images, 5000 annotations, 10 categories

Issues found:
⚠ Warning: 3 images have no annotations
⚠ Warning: 2 annotations have zero area
✗ Error: Duplicate annotation ID: 42
✗ Error: Annotation 123 references non-existent image ID: 999

Summary: 2 errors, 5 warnings
Use validation before training to catch data issues early. Many training failures are due to malformed annotations.

Get Dataset Statistics

1

Generate a statistical overview

Understand your dataset composition:
panlabel stats --format coco annotations.json
Output:
Dataset Statistics
==================

Images: 1,000
Annotations: 5,000
Categories: 10

Annotations per image:
  Mean: 5.0
  Median: 4.0
  Min: 0
  Max: 25

Category distribution:
  person: 2,000 (40%)
  car: 1,500 (30%)
  dog: 800 (16%)
  cat: 700 (14%)
  ...

Bounding box sizes:
  Mean area: 12,500 px²
  Median area: 8,000 px²
2

Export as JSON or HTML

Get machine-readable stats or a visual report:
panlabel stats --format coco annotations.json --output json > stats.json

Compare Datasets

1

Find differences between datasets

Compare two versions of your dataset:
panlabel diff --format-a auto --format-b auto old.json new.json
Output:
Dataset Comparison
==================

Images:
  Added: 50
  Removed: 10
  Modified: 5

Annotations:
  Added: 200
  Removed: 30
  Modified: 15

Categories:
  Added: person, car
  Removed: bicycle

Sample a Subset

1

Create a smaller dataset for testing

Extract a random subset of your data:
panlabel sample -i annotations.json -o sample.ir.json --from auto --to ir-json -n 100 --seed 42
This creates a dataset with exactly 100 images, randomly selected with a fixed seed for reproducibility.
2

Use stratified sampling

Maintain category distribution in your sample:
panlabel sample -i annotations.json -o sample.json --from auto --to coco -n 100 --strategy stratified
Stratified sampling ensures that rare categories are represented proportionally in your subset.

List Supported Formats

1

View format capabilities

See which formats Panlabel supports:
panlabel list-formats
Output:
Supported Formats
=================

ir-json
  Read: ✓  Write: ✓  Lossless: ✓
  Description: Panlabel's intermediate representation

coco
  Read: ✓  Write: ✓  Lossless: Conditional
  Description: COCO object detection format

yolo
  Read: ✓  Write: ✓  Lossless: ✗
  Description: Ultralytics YOLO format

...

Common Workflows

Quality Control Pipeline:
  1. Validate your dataset: panlabel validate --format coco data.json
  2. Check statistics: panlabel stats --format coco data.json
  3. Create a test subset: panlabel sample -i data.json -o test.json -n 100
  4. Convert to training format: panlabel convert -f coco -t yolo -i data.json -o ./yolo_train
Experiment with Different Formats:
  1. Convert to IR JSON for lossless storage: panlabel convert -f coco -t ir-json -i data.json -o data.ir.json
  2. Generate multiple format versions from IR:
    • panlabel convert -f ir-json -t yolo -i data.ir.json -o ./yolo --allow-lossy
    • panlabel convert -f ir-json -t voc -i data.ir.json -o ./voc --allow-lossy
Version Control Your Datasets:
  1. Export baseline: panlabel convert -f auto -t ir-json -i baseline.json -o v1.ir.json
  2. Make changes and export: panlabel convert -f auto -t ir-json -i updated.json -o v2.ir.json
  3. Compare versions: panlabel diff --format-a ir-json --format-b ir-json v1.ir.json v2.ir.json

Next Steps

CLI Reference

Explore all available commands and options

Format Reference

Learn about format-specific details and limitations

Conversion & Lossiness

Understand what data gets lost in conversions

Contributing

Learn how to contribute to Panlabel

Build docs developers (and LLMs) love