Evaluation Overview

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

SA-Co Dataset
Dataset Components
Benchmark Scale
Evaluation Metrics
Primary Metric: cgF1
Additional Metrics
Benchmark Results
Image Segmentation Performance
Video Segmentation Performance
Download Locations
Hugging Face
Roboflow
Next Steps

SAM 3 provides comprehensive evaluation benchmarks for promptable concept segmentation (PCS) in both images and videos. The evaluation suite consists of the SA-Co dataset, which includes two image benchmarks (Gold and Silver) and one video benchmark (VEval).

SA-Co Dataset

The SA-Co dataset is designed to evaluate open-vocabulary segmentation capabilities with text prompts. It contains images and videos paired with noun phrases (NPs), each exhaustively annotated with instance masks for all objects matching the phrase.

Dataset Components

SA-Co/Gold

High-quality image benchmark with 3 independent annotations per datapoint

SA-Co/Silver

Diverse image benchmark spanning 10 different domains

SA-Co/VEval

Video benchmark with 3 domains for temporal segmentation

Benchmark Scale

SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, which contains:

270,000+ unique concepts - over 50 times more than existing benchmarks
4 million+ annotated concepts in the training data
Multiple annotation domains covering diverse visual scenarios

Evaluation Metrics

Primary Metric: cgF1

The official metric for all SA-Co benchmarks is cgF1 (concept-grounded F1 score). This metric evaluates:

Detection accuracy - correctly identifying object instances
Segmentation quality - mask precision at the instance level
Negative prompt handling - correctly rejecting non-existent objects

Additional Metrics

IL_MCC - Instance-level Matthews Correlation Coefficient
positive_micro_F1 / pmF1 - F1 score computed only on positive (present) prompts
pHOTA - Promptable Higher Order Tracking Accuracy (video)
AP - Average Precision (for comparison with standard benchmarks)

Benchmark Results

Image Segmentation Performance

On the SA-Co/Gold benchmark, SAM 3 achieves:

Model	SA-Co/Gold cgF1	LVIS AP	LVIS cgF1
Human	72.8	-	-
SAM 3	54.1	48.5	37.2
DINO-X	21.3	38.5	-
OWLv2*	24.6	43.4	29.3
Gemini 2.5	13.0	-	13.4

Video Segmentation Performance

On the SA-Co/VEval benchmarks:

Dataset	Human cgF1	SAM 3 cgF1	SAM 3 pHOTA
SA-V test	53.1	30.3	58.0
YT-Temporal-1B test	71.2	50.8	69.9
SmartGlasses test	58.5	36.4	63.6

SAM 3 achieves approximately 74-75% of human performance across the SA-Co benchmarks, representing a significant advancement in open-vocabulary segmentation capabilities.

Download Locations

All SA-Co datasets are available from two hosting platforms:

Hugging Face

Roboflow

Next Steps

Run Evaluations

Learn how to run evaluations on your own predictions

Dataset Details

Explore the detailed structure of each benchmark

Cluster Training

SA-Co/Gold Benchmark

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Guides

Training & Fine-tuning

Evaluation

Evaluation Overview

SA-Co Dataset

Dataset Components

SA-Co/Gold

SA-Co/Silver

SA-Co/VEval

Benchmark Scale

Evaluation Metrics

Primary Metric: cgF1

Additional Metrics

Benchmark Results

Image Segmentation Performance

Video Segmentation Performance

Download Locations

Hugging Face

Roboflow

Next Steps

Run Evaluations

Dataset Details

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Training & Fine-tuning

Evaluation

​SA-Co Dataset

​Dataset Components

SA-Co/Gold

SA-Co/Silver

SA-Co/VEval

​Benchmark Scale

​Evaluation Metrics

​Primary Metric: cgF1

​Additional Metrics

​Benchmark Results

​Image Segmentation Performance

​Video Segmentation Performance

​Download Locations

​Hugging Face

​Roboflow

​Next Steps

Run Evaluations

Dataset Details

Build docs developers (and LLMs) love

SA-Co Dataset

Dataset Components

Benchmark Scale

Evaluation Metrics

Primary Metric: cgF1

Additional Metrics

Benchmark Results

Image Segmentation Performance

Video Segmentation Performance

Download Locations

Hugging Face

Roboflow

Next Steps