Skip to main content
The diff module provides functionality for comparing two annotation datasets and identifying differences in images, categories, and annotations.

Main Function

diff_datasets

Compute a semantic diff between two datasets.
pub fn diff_datasets(
    a: &Dataset,
    b: &Dataset,
    opts: &DiffOptions
) -> DiffReport
Compares two datasets and produces a structured report showing:
  • Images present in both datasets vs. unique to each
  • Categories shared vs. unique to each dataset
  • Annotation differences (added, removed, modified)
Parameters:
  • a - First dataset to compare
  • b - Second dataset to compare
  • opts - Comparison options (matching strategy, IoU threshold, detail level)
Returns: A DiffReport containing comparison results

Types

MatchBy

Annotation matching strategy.
pub enum MatchBy {
    /// Match annotations by annotation ID (within shared images)
    Id,
    /// Match annotations greedily by IoU within shared image + category
    Iou,
}
  • Id: Match annotations by their IDs. Requires both datasets to have consistent ID schemes.
  • Iou: Match annotations based on bounding box overlap (Intersection over Union). Useful when IDs differ between datasets.

DiffOptions

Configuration for dataset comparison.
pub struct DiffOptions {
    pub match_by: MatchBy,
    pub iou_threshold: f64,
    pub detail: bool,
    pub max_items: usize,
    pub bbox_eps: f64,
}
Fields:
  • match_by - Annotation matching strategy (MatchBy::Id or MatchBy::Iou)
  • iou_threshold - IoU threshold for matching (used when match_by is Iou). Default: 0.5
  • detail - Include item-level details in the report. Default: false
  • max_items - Maximum number of detail items to include. Default: 20
  • bbox_eps - Epsilon for floating-point bbox comparisons. Default: 1e-6
Default values:
impl Default for DiffOptions {
    fn default() -> Self {
        Self {
            match_by: MatchBy::Id,
            iou_threshold: 0.5,
            detail: false,
            max_items: 20,
            bbox_eps: 1e-6,
        }
    }
}

DiffReport

Structured diff results.
pub struct DiffReport {
    pub images: DiffCounts,
    pub categories: DiffCounts,
    pub annotations: DiffAnnotationCounts,
    pub detail: Option<DiffDetail>,
}

DiffCounts

Count of shared and unique items.
pub struct DiffCounts {
    pub shared: usize,
    pub only_in_a: usize,
    pub only_in_b: usize,
}

DiffAnnotationCounts

Annotation-specific diff counts.
pub struct DiffAnnotationCounts {
    pub shared: usize,
    pub only_in_a: usize,
    pub only_in_b: usize,
    pub modified: usize,
}

Example

use panlabel::diff::{diff_datasets, DiffOptions, MatchBy};
use panlabel::ir;

// Load two versions of a dataset
let dataset_v1 = ir::io_coco_json::read_coco_json("v1.json")?;
let dataset_v2 = ir::io_coco_json::read_coco_json("v2.json")?;

// Compare using ID matching
let opts = DiffOptions {
    match_by: MatchBy::Id,
    detail: true,
    ..Default::default()
};

let report = diff_datasets(&dataset_v1, &dataset_v2, &opts);

println!("Images: {} shared, {} only in v1, {} only in v2",
    report.images.shared,
    report.images.only_in_a,
    report.images.only_in_b
);

println!("Annotations: {} modified",
    report.annotations.modified
);

IoU-Based Matching Example

When comparing datasets with different ID schemes:
use panlabel::diff::{diff_datasets, DiffOptions, MatchBy};

let opts = DiffOptions {
    match_by: MatchBy::Iou,
    iou_threshold: 0.5,  // Consider boxes matching if IoU >= 0.5
    ..Default::default()
};

let report = diff_datasets(&dataset_a, &dataset_b, &opts);

Build docs developers (and LLMs) love