SA-Co Dataset
The SA-Co dataset is designed to evaluate open-vocabulary segmentation capabilities with text prompts. It contains images and videos paired with noun phrases (NPs), each exhaustively annotated with instance masks for all objects matching the phrase.Dataset Components
SA-Co/Gold
High-quality image benchmark with 3 independent annotations per datapoint
SA-Co/Silver
Diverse image benchmark spanning 10 different domains
SA-Co/VEval
Video benchmark with 3 domains for temporal segmentation
Benchmark Scale
SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, which contains:- 270,000+ unique concepts - over 50 times more than existing benchmarks
- 4 million+ annotated concepts in the training data
- Multiple annotation domains covering diverse visual scenarios
Evaluation Metrics
Primary Metric: cgF1
The official metric for all SA-Co benchmarks is cgF1 (concept-grounded F1 score). This metric evaluates:- Detection accuracy - correctly identifying object instances
- Segmentation quality - mask precision at the instance level
- Negative prompt handling - correctly rejecting non-existent objects
Additional Metrics
- IL_MCC - Instance-level Matthews Correlation Coefficient
- positive_micro_F1 / pmF1 - F1 score computed only on positive (present) prompts
- pHOTA - Promptable Higher Order Tracking Accuracy (video)
- AP - Average Precision (for comparison with standard benchmarks)
Benchmark Results
Image Segmentation Performance
On the SA-Co/Gold benchmark, SAM 3 achieves:| Model | SA-Co/Gold cgF1 | LVIS AP | LVIS cgF1 |
|---|---|---|---|
| Human | 72.8 | - | - |
| SAM 3 | 54.1 | 48.5 | 37.2 |
| DINO-X | 21.3 | 38.5 | - |
| OWLv2* | 24.6 | 43.4 | 29.3 |
| Gemini 2.5 | 13.0 | - | 13.4 |
Video Segmentation Performance
On the SA-Co/VEval benchmarks:| Dataset | Human cgF1 | SAM 3 cgF1 | SAM 3 pHOTA |
|---|---|---|---|
| SA-V test | 53.1 | 30.3 | 58.0 |
| YT-Temporal-1B test | 71.2 | 50.8 | 69.9 |
| SmartGlasses test | 58.5 | 36.4 | 63.6 |
SAM 3 achieves approximately 74-75% of human performance across the SA-Co benchmarks, representing a significant advancement in open-vocabulary segmentation capabilities.
Download Locations
All SA-Co datasets are available from two hosting platforms:Hugging Face
Roboflow
Next Steps
Run Evaluations
Learn how to run evaluations on your own predictions
Dataset Details
Explore the detailed structure of each benchmark