Overview
TheCocoEvaluator class provides standard COCO evaluation metrics (AP, AR) for segmentation and detection tasks with distributed training support.
CocoEvaluator
Class Initialization
Parameters
COCO API object(s) containing ground truth annotations. Can be a single COCO object or list for oracle evaluation.
Types of IoU to evaluate:
["segm"] for masks, ["bbox"] for boxes, or both.Whether to use categories for evaluation. Set
False for open-vocabulary tasks.Directory to dump predictions. If
None, predictions are not saved.Postprocessor module to convert model outputs to COCO format.
Whether to compute AP separately for different object rarity buckets and average.
Whether object areas are normalized by image area. Affects size bucket definitions.
Maximum number of detections to evaluate per image.
Whether to restrict evaluation to exhaustively annotated images only.
Whether to require all ground truth sources to be exhaustive (for oracle evaluation).
Methods
update
Update evaluator with model outputs.synchronize_between_processes
Synchronize predictions across distributed processes.accumulate
Accumulate evaluation results.summarize
Compute and print summary metrics.Dictionary containing COCO metrics:
coco_eval_masks_AP: Mask AP (averaged over IoU thresholds)coco_eval_masks_AP_50: Mask AP @ IoU=0.5coco_eval_masks_AP_75: Mask AP @ IoU=0.75coco_eval_masks_AP_{size}: AP by size (tiny/small/medium/large/huge)coco_eval_masks_AR: Average Recall- Similar metrics for
bboxif enabled
compute_synced
Run full evaluation pipeline (sync + accumulate + summarize).Example Usage
Basic Evaluation
Distributed Training
Box and Mask Evaluation
Custom Max Detections
Normalized Areas
Metrics Explained
Average Precision (AP)
AP - Mean AP over IoU thresholds [0.5, 0.95] with step 0.05 AP_50 - AP at IoU threshold 0.5 (loose localization) AP_75 - AP at IoU threshold 0.75 (strict localization) AP_ - AP for specific object sizes:tiny: Very small objects (area < 0.1% of image)small: Small objects (0.1% - 1% of image)medium: Medium objects (1% - 10% of image)large: Large objects (10% - 50% of image)huge: Very large objects (50% - 95% of image)whole_image: Nearly entire image (> 95%)
Average Recall (AR)
AR - Mean recall at max detections threshold AR_50 - AR at maxDets=50 (if maxdets includes 50) AR_75 - AR at maxDets=75 (if maxdets includes 75) AR_ - Recall by object sizePostprocessor Requirements
The postprocessor must implement:COCO Format Requirements
Ground Truth
Predictions
Predictions are automatically converted to:Notes
- Uses pycocotools internally
- Supports distributed evaluation across multiple GPUs
- Predictions can be dumped to disk for later analysis
- Size buckets automatically adjusted for normalized areas
- Compatible with COCO, LVIS, and custom datasets in COCO format
- For open-vocabulary tasks, set
useCats=False
See Also
- cgF1 Evaluation - Comprehensive grounding F1 metric
- pycocotools documentation