Skip to main content
The diff command performs semantic comparison between two annotation datasets, identifying differences in images, annotations, and categories.

Usage

panlabel diff [OPTIONS] <INPUT_A> <INPUT_B>

Parameters

input_a
path
required
Path to the first dataset.
input_b
path
required
Path to the second dataset.
--format-a
string
default:"auto"
Format of the first input dataset.Supported values: auto, ir-json, coco, cvat, label-studio, tfod, yolo, voc
--format-b
string
default:"auto"
Format of the second input dataset.Supported values: auto, ir-json, coco, cvat, label-studio, tfod, yolo, voc
--match-by
string
default:"id"
Strategy for matching annotations between datasets.Options:
  • id - Match annotations by their ID field
  • iou - Match annotations by Intersection over Union (spatial overlap)
Use iou when annotation IDs are not consistent between datasets.
--iou-threshold
float
default:"0.5"
IoU threshold for matching when using --match-by iou.Must be in the range (0.0, 1.0]. Higher values require more overlap to consider annotations as matching.Common values:
  • 0.5 - Standard object detection threshold
  • 0.75 - Stricter matching requirement
  • 0.9 - Very strict, nearly exact overlap required
--detail
flag
default:"false"
Include item-level details in the output.When enabled, shows specific differences for each image and annotation rather than just summary counts.
--output
string
default:"text"
Output format for the diff report.Options:
  • text - Human-readable report
  • json - Machine-readable JSON

Matching Strategies

ID Matching (--match-by id)

Matches annotations based on their unique ID field. Best when:
  • Comparing versions of the same dataset
  • IDs are stable across versions
  • You want to track changes to specific annotations
Requirements:
  • Datasets must have unique image file names
  • Annotation IDs should be consistent

IoU Matching (--match-by iou)

Matches annotations based on spatial overlap (Intersection over Union). Best when:
  • Comparing annotations from different sources
  • IDs are not consistent or meaningful
  • You care about spatial agreement rather than ID correspondence
How it works:
  1. For each annotation in dataset A, find the best-matching annotation in dataset B (same image, same category)
  2. If IoU ≥ threshold, consider them matching
  3. Report unmatched, matched, and modified annotations

Examples

Basic Diff with ID Matching

panlabel diff dataset_v1.json dataset_v2.json --match-by id
Compares two versions of a dataset using annotation IDs.

IoU-Based Comparison

panlabel diff original.json annotated.json \
  --match-by iou \
  --iou-threshold 0.75
Compares datasets using spatial overlap with 75% IoU threshold.

Detailed Diff Report

panlabel diff a.json b.json --match-by id --detail
Shows item-level differences for each image and annotation.

Cross-Format Comparison

panlabel diff coco_dataset.json yolo_dataset/ \
  --format-a coco \
  --format-b yolo \
  --match-by iou
Compares datasets in different formats.

JSON Output for Automation

panlabel diff old.json new.json \
  --match-by id \
  --detail \
  --output json > diff_report.json
Generates a machine-readable diff report.

Auto-Detection

panlabel diff dataset1/ dataset2/
Auto-detects both formats (works with YOLO, VOC, CVAT directories).

Output Examples

Text Report (Summary)

Dataset Diff: dataset_v1.json vs dataset_v2.json

Summary:
  Match strategy: id
  
Images:
  In both:     145
  Only in A:   5
  Only in B:   8
  Total:       158

Annotations:
  Matched:     1,184
  Modified:    23
  Only in A:   42
  Only in B:   67
  Total:       1,316

Categories:
  In both:     8
  Only in A:   1 ("old_class")
  Only in B:   2 ("new_class_1", "new_class_2")

Text Report (Detailed)

Dataset Diff: dataset_v1.json vs dataset_v2.json

... Summary (as above) ...

Image Details:
  Only in A:
    - deleted_001.jpg
    - deleted_002.jpg
    - deleted_003.jpg
    - deleted_004.jpg
    - deleted_005.jpg
    
  Only in B:
    - new_001.jpg
    - new_002.jpg
    - new_003.jpg
    - new_004.jpg
    - new_005.jpg
    ... and 3 more

Annotation Details (showing first 20):
  Modified:
    - image_042.jpg / annotation #123:
      Category: person → pedestrian
      BBox: [100, 50, 200, 150] → [102, 51, 198, 149]
    
    - image_089.jpg / annotation #456:
      BBox: [300, 200, 400, 350] → [305, 205, 395, 345]
    
    ... and 21 more
  
  Only in A:
    - image_010.jpg / annotation #789 (person)
    - image_015.jpg / annotation #790 (car)
    ... and 40 more
  
  Only in B:
    - image_020.jpg / annotation #1001 (bicycle)
    - image_025.jpg / annotation #1002 (motorcycle)
    ... and 65 more

JSON Report Structure

{
  "summary": {
    "match_strategy": "id",
    "iou_threshold": null,
    "images": {
      "in_both": 145,
      "only_a": 5,
      "only_b": 8,
      "total": 158
    },
    "annotations": {
      "matched": 1184,
      "modified": 23,
      "only_a": 42,
      "only_b": 67,
      "total": 1316
    },
    "categories": {
      "in_both": ["person", "car", "bicycle", ...],
      "only_a": ["old_class"],
      "only_b": ["new_class_1", "new_class_2"]
    }
  },
  "details": {
    "images_only_a": ["deleted_001.jpg", ...],
    "images_only_b": ["new_001.jpg", ...],
    "modified_annotations": [
      {
        "image": "image_042.jpg",
        "annotation_id": 123,
        "changes": {
          "category": {"from": "person", "to": "pedestrian"},
          "bbox": {
            "from": [100, 50, 200, 150],
            "to": [102, 51, 198, 149]
          }
        }
      }
    ]
  }
}

Use Cases

Track Annotation Changes

panlabel diff v1.0.json v2.0.json --match-by id --detail > changes.txt
Document what changed between dataset versions.

Compare Annotators

panlabel diff annotator1.json annotator2.json --match-by iou --iou-threshold 0.5
Measure inter-annotator agreement.

Verify Conversion Accuracy

# Convert and compare
panlabel convert -f coco -t yolo -i original.json -o ./yolo_out/
panlabel convert -f yolo -t coco -i ./yolo_out/ -o roundtrip.json
panlabel diff original.json roundtrip.json --match-by iou
Check for information loss during format conversion.

CI/CD Quality Gates

#!/bin/bash
# Fail if too many annotations changed
panlabel diff baseline.json current.json --output json > diff.json
modified=$(jq '.summary.annotations.modified' diff.json)
if [ "$modified" -gt 100 ]; then
  echo "Too many changes: $modified annotations modified"
  exit 1
fi

Dataset Merge Validation

panlabel diff dataset_a.json merged.json --match-by id --detail
panlabel diff dataset_b.json merged.json --match-by id --detail
Verify that merged dataset contains all original annotations.

Important Notes

Image file names must be unique within each dataset when using diff. If duplicates exist, the command will fail with an error message.
The --iou-threshold parameter only applies when using --match-by iou. It is ignored when using --match-by id.

Exit Codes

  • 0 - Diff completed successfully (differences may or may not exist)
  • 1 - Error occurred (format detection failed, invalid threshold, duplicate file names, etc.)

See Also

Stats Command

Analyze individual datasets

Validate Command

Validate datasets before comparison

Build docs developers (and LLMs) love