Overview
The cgF1 (Comprehensive Grounding F1) metric evaluates segmentation models in realistic downstream application settings. It combines instance-level matching with image-level binary classification.Metric Definition
cgF1 is computed as:- positive_micro_F1: Micro-averaged F1 score across images with ground truth objects
- IL_MCC: Image-level Matthews Correlation Coefficient
CGF1Evaluator
Class Initialization
Parameters
Path(s) to ground truth COCO JSON file(s). Multiple paths enable oracle evaluation.
Type of IoU evaluation:
"segm" for masks or "bbox" for boxes.Whether to print detailed evaluation progress.
evaluate
Run cgF1 evaluation on predictions.Path to predictions COCO JSON file.
Dictionary of metric values with keys like:
cgF1_eval_segm_cgF1: Main cgF1 scorecgF1_eval_segm_precision: PrecisioncgF1_eval_segm_recall: RecallcgF1_eval_segm_F1: Standard F1cgF1_eval_segm_IL_F1: Image-level F1cgF1_eval_segm_IL_MCC: Image-level MCC- Additional metrics at IoU 0.5 and 0.75
Metrics Explained
Instance-Level Metrics
These metrics are computed at the instance (object) level: cgF1 - Main metric combining instance and image-level performance precision - True Positives / (True Positives + False Positives) recall - True Positives / (True Positives + False Negatives) F1 - Harmonic mean of precision and recall positive_micro_F1 - Micro-averaged F1 on images with ground truth (excludes true negatives) positive_micro_precision - Micro-averaged precision on positive images positive_macro_F1 - Macro-averaged F1 across positive imagesImage-Level Metrics
These treat each image as a binary classification (has objects vs. no objects): IL_precision - Image-level precision IL_recall - Image-level recall IL_F1 - Image-level F1 score IL_FPR - Image-level false positive rate IL_MCC - Image-level Matthews Correlation CoefficientIoU Thresholds
Metrics are reported at different IoU thresholds:- No suffix: Averaged over IoU ∈ [0.5, 0.95] (step 0.05)
- @0.5: IoU threshold = 0.5
- @0.75: IoU threshold = 0.75
Example Usage
Basic Evaluation
Oracle Evaluation
Oracle evaluation uses multiple ground truth annotations and picks the best match:All Metrics
Ground Truth Format
Ground truth file must be in COCO format with an additional field:"is_instance_exhaustive": true are evaluated.
Predictions Format
Predictions must be in COCO result format:Understanding cgF1
Why cgF1?
Traditional metrics like COCO AP have limitations:- Insensitive to false positives: High AP even with many false detections
- No penalty for hallucinations: Predicting objects on empty images is not penalized
- Not application-focused: Doesn’t reflect downstream task performance
- Penalizing false positives through precision
- Evaluating image-level binary classification (object present or not)
- Combining both aspects into a single metric
When to Use cgF1
Use cgF1 when:- Evaluating open-vocabulary or referring expression models
- False positives are costly in your application
- You need to detect when objects are absent
- Comparing models on realistic downstream tasks
- Traditional object detection evaluation
- All images contain objects (no negatives)
- Focusing purely on localization quality
Advanced Usage
Custom IoU Threshold
The evaluator computes metrics across IoU thresholds [0.5, 0.95] by default. Access specific thresholds:Box Evaluation
Implementation Notes
- Uses Hungarian matching for instance assignment
- Confidence threshold is fixed at 0.5 (can be modified in
CGF1Evalclass) - Image-level metrics use binary classification (any object vs. no object)
- Excludes images marked as not exhaustively annotated
- For oracle evaluation, selects best F1 among multiple ground truths per image
See Also
- COCO Evaluation - Standard COCO metrics
- SAM 3 paper for detailed cgF1 definition and motivation