diff module provides functionality for comparing two annotation datasets and identifying differences in images, categories, and annotations.
Main Function
diff_datasets
Compute a semantic diff between two datasets.
- Images present in both datasets vs. unique to each
- Categories shared vs. unique to each dataset
- Annotation differences (added, removed, modified)
a- First dataset to compareb- Second dataset to compareopts- Comparison options (matching strategy, IoU threshold, detail level)
DiffReport containing comparison results
Types
MatchBy
Annotation matching strategy.
Id: Match annotations by their IDs. Requires both datasets to have consistent ID schemes.Iou: Match annotations based on bounding box overlap (Intersection over Union). Useful when IDs differ between datasets.
DiffOptions
Configuration for dataset comparison.
match_by- Annotation matching strategy (MatchBy::IdorMatchBy::Iou)iou_threshold- IoU threshold for matching (used whenmatch_byisIou). Default:0.5detail- Include item-level details in the report. Default:falsemax_items- Maximum number of detail items to include. Default:20bbox_eps- Epsilon for floating-point bbox comparisons. Default:1e-6
DiffReport
Structured diff results.
DiffCounts
Count of shared and unique items.
DiffAnnotationCounts
Annotation-specific diff counts.
Example
IoU-Based Matching Example
When comparing datasets with different ID schemes:Related
- diff command - CLI interface for dataset comparison
- Dataset type - IR Dataset structure