Dataset Diff

The diff module provides functionality for comparing two annotation datasets and identifying differences in images, categories, and annotations.

Main Function

`diff_datasets`

Compute a semantic diff between two datasets.

pub fn diff_datasets(
    a: &Dataset,
    b: &Dataset,
    opts: &DiffOptions
) -> DiffReport

Compares two datasets and produces a structured report showing:

Images present in both datasets vs. unique to each
Categories shared vs. unique to each dataset
Annotation differences (added, removed, modified)

Parameters:

a - First dataset to compare
b - Second dataset to compare
opts - Comparison options (matching strategy, IoU threshold, detail level)

Returns: A DiffReport containing comparison results

Types

`MatchBy`

Annotation matching strategy.

pub enum MatchBy {
    /// Match annotations by annotation ID (within shared images)
    Id,
    /// Match annotations greedily by IoU within shared image + category
    Iou,
}

Id: Match annotations by their IDs. Requires both datasets to have consistent ID schemes.
Iou: Match annotations based on bounding box overlap (Intersection over Union). Useful when IDs differ between datasets.

`DiffOptions`

Configuration for dataset comparison.

pub struct DiffOptions {
    pub match_by: MatchBy,
    pub iou_threshold: f64,
    pub detail: bool,
    pub max_items: usize,
    pub bbox_eps: f64,
}

Fields:

match_by - Annotation matching strategy (MatchBy::Id or MatchBy::Iou)
iou_threshold - IoU threshold for matching (used when match_by is Iou). Default: 0.5
detail - Include item-level details in the report. Default: false
max_items - Maximum number of detail items to include. Default: 20
bbox_eps - Epsilon for floating-point bbox comparisons. Default: 1e-6

Default values:

impl Default for DiffOptions {
    fn default() -> Self {
        Self {
            match_by: MatchBy::Id,
            iou_threshold: 0.5,
            detail: false,
            max_items: 20,
            bbox_eps: 1e-6,
        }
    }
}

`DiffReport`

Structured diff results.

pub struct DiffReport {
    pub images: DiffCounts,
    pub categories: DiffCounts,
    pub annotations: DiffAnnotationCounts,
    pub detail: Option<DiffDetail>,
}

`DiffCounts`

Count of shared and unique items.

pub struct DiffCounts {
    pub shared: usize,
    pub only_in_a: usize,
    pub only_in_b: usize,
}

`DiffAnnotationCounts`

Annotation-specific diff counts.

pub struct DiffAnnotationCounts {
    pub shared: usize,
    pub only_in_a: usize,
    pub only_in_b: usize,
    pub modified: usize,
}

Example

use panlabel::diff::{diff_datasets, DiffOptions, MatchBy};
use panlabel::ir;

// Load two versions of a dataset
let dataset_v1 = ir::io_coco_json::read_coco_json("v1.json")?;
let dataset_v2 = ir::io_coco_json::read_coco_json("v2.json")?;

// Compare using ID matching
let opts = DiffOptions {
    match_by: MatchBy::Id,
    detail: true,
    ..Default::default()
};

let report = diff_datasets(&dataset_v1, &dataset_v2, &opts);

println!("Images: {} shared, {} only in v1, {} only in v2",
    report.images.shared,
    report.images.only_in_a,
    report.images.only_in_b
);

println!("Annotations: {} modified",
    report.annotations.modified
);

IoU-Based Matching Example

When comparing datasets with different ID schemes:

use panlabel::diff::{diff_datasets, DiffOptions, MatchBy};

let opts = DiffOptions {
    match_by: MatchBy::Iou,
    iou_threshold: 0.5,  // Consider boxes matching if IoU >= 0.5
    ..Default::default()
};

let report = diff_datasets(&dataset_a, &dataset_b, &opts);

diff command - CLI interface for dataset comparison
Dataset type - IR Dataset structure

Rust Library

Main Function

`diff_datasets`

Types

`MatchBy`

`DiffOptions`

`DiffReport`

`DiffCounts`

`DiffAnnotationCounts`

Example

IoU-Based Matching Example

Build docs developers (and LLMs) love

Rust Library

​Main Function

​diff_datasets

​Types

​MatchBy

​DiffOptions

​DiffReport

​DiffCounts

​DiffAnnotationCounts

​Example

​IoU-Based Matching Example

​Related

Build docs developers (and LLMs) love

Main Function

`diff_datasets`

Types

`MatchBy`

`DiffOptions`

`DiffReport`

`DiffCounts`

`DiffAnnotationCounts`

Example

IoU-Based Matching Example

Related