Skip to main content
SA-Co/Silver is a large-scale, diverse benchmark for promptable concept segmentation (PCS) in images. Unlike SA-Co/Gold, each datapoint has a single ground-truth annotation, covering 10 different domains from food to robotics to underwater imagery.

Overview

The benchmark contains images paired with noun phrases (NPs), each exhaustively annotated with masks for all object instances matching the phrase. SA-Co/Silver comprises 10 subsets covering diverse visual domains.
Since SA-Co/Silver has only one annotation per datapoint (unlike Gold’s triple annotations), results may slightly underestimate model performance due to not accounting for different valid interpretations of each query.

Dataset Composition

10 Annotation Domains

Urban driving scenarios from Berkeley Driving Dataset
  • 5,546 image-NP pairs
  • 13,210 image-NP-masks
  • Domain: Autonomous driving
Robot manipulation scenarios from diverse environments
  • 9,445 image-NP pairs
  • 11,098 image-NP-masks
  • Domain: Robotics and manipulation
First-person perspective frames from daily activities
  • 12,608 image-NP pairs
  • 24,049 image-NP-masks
  • Domain: Egocentric vision
Food dishes and ingredients
  • 20,985 image-NP pairs
  • 28,347 image-NP-masks
  • Domain: Food recognition
Images from geographically diverse locations worldwide
  • 14,850 image-NP pairs
  • 7,570 image-NP-masks
  • Domain: Geographic diversity
Natural world observations of plants and animals
  • 1,439,051 image-NP pairs
  • 48,899 image-NP-masks
  • Domain: Biodiversity and nature
Diverse video frames from Segment Anything Video dataset
  • 18,337 image-NP pairs
  • 39,683 image-NP-masks
  • Domain: General video understanding
Frames from YouTube videos across various categories
  • 7,816 image-NP pairs
  • 12,221 image-NP-masks
  • Domain: Web video
Marine life and underwater environments
  • 287,193 image-NP pairs
  • 14,174 image-NP-masks
  • Domain: Marine biology

Statistics Table

Domain# Image-NPs# Image-NP-Masks
BDD100k5,54613,210
DROID9,44511,098
Ego4D12,60824,049
MyFoodRepo-27320,98528,347
GeoDE14,8507,570
iNaturalist-20171,439,05148,899
National Gallery of Art22,29418,991
SA-V18,33739,683
YT-Temporal-1B7,81612,221
Fathomnet287,19314,174

Download Dataset

Annotations

Download GT annotations from:

Images and Frames

Each domain has different download instructions:

GeoDE

# Option 1: Download processed images from Roboflow
wget https://universe.roboflow.com/sa-co-silver/geode/

# Option 2: Process raw images yourself
# 1. Download from https://geodiverse-data-collection.cs.princeton.edu/
# 2. Run preprocessing
python preprocess_silver_geode_bdd100k_food_rec.py \
  --annotation_file <ANNOTATIONS>/silver_geode_merged_test.json \
  --raw_images_folder <RAW_GEODE_IMAGES_FOLDER> \
  --processed_images_folder <PROCESSED_GEODE_IMAGES_FOLDER> \
  --dataset_name geode
# Download and preprocess automatically
python download_preprocess_nga.py \
  --annotation_file <ANNOTATIONS>/silver_nga_art_merged_test.json \
  --raw_images_folder <RAW_NGA_IMAGES_FOLDER> \
  --processed_images_folder <PROCESSED_NGA_IMAGES_FOLDER>

BDD100k

# 1. Download 100K Images from http://bdd-data.berkeley.edu/download.html
# 2. Preprocess
python preprocess_silver_geode_bdd100k_food_rec.py \
  --annotation_file <ANNOTATIONS>/silver_bdd100k_merged_test.json \
  --raw_images_folder <RAW_BDD_IMAGES_FOLDER> \
  --processed_images_folder <PROCESSED_BDD_IMAGES_FOLDER> \
  --dataset_name bdd100k

Food Recognition Challenge 2022

# 1. Download from https://www.aicrowd.com/challenges/food-recognition-benchmark-2022
#    File: [Round 2] public_validation_set_2.0.tar.gz
# 2. Preprocess
python preprocess_silver_geode_bdd100k_food_rec.py \
  --annotation_file <ANNOTATIONS>/silver_food_rec_merged_test.json \
  --raw_images_folder <RAW_FOOD_IMAGES_FOLDER> \
  --processed_images_folder <PROCESSED_FOOD_IMAGES_FOLDER> \
  --dataset_name food_rec

iNaturalist

# Download and extract automatically
python download_inaturalist.py \
  --raw_images_folder <RAW_INATURALIST_IMAGES_FOLDER> \
  --processed_images_folder <PROCESSED_INATURALIST_IMAGES_FOLDER>

Fathomnet

# 1. Install FathomNet API
pip install fathomnet

# 2. Download images
python download_fathomnet.py \
  --processed_images_folder <PROCESSED_FATHOMNET_IMAGES_FOLDER>

Annotation Format

The annotation format is identical to SA-Co/Gold, derived from COCO format.

Example from DROID Domain

Images

[
  {
    "id": 10000000,
    "file_name": "AUTOLab_failure_2023-07-07_Fri_Jul__7_18:50:36_2023_recordings_MP4_22008760/00002.jpg",
    "text_input": "the large wooden table",
    "width": 1280,
    "height": 720,
    "queried_category": "3",
    "is_instance_exhaustive": 1,
    "is_pixel_exhaustive": 1
  }
]

Annotations

[
  {
    "area": 0.17324327256944444,
    "id": 1,
    "image_id": 10000000,
    "bbox": [0.0375, 0.5083, 0.8383, 0.4917],
    "segmentation": {
      "counts": "[^R11]f03O0O100O2N100O...",
      "size": [720, 1280]
    },
    "category_id": 1,
    "iscrowd": 0
  }
]
For detailed field descriptions, see the SA-Co/Gold annotation format which is identical.

Benchmark Results

Overall Performance

ModelAverage cgF1IL_MCCpmF1
SAM 349.570.7665.17
OWLv2*11.230.3231.18
Gemini 2.59.670.1945.51
OWLv28.180.2332.55
LLMDet-L6.730.1728.19
gDino-T3.090.1219.75

Per-Domain Results (SAM 3)

DomaincgF1IL_MCCpmF1
iNaturalist70.070.8978.73
National Gallery of Art65.800.8280.67
Food Recognition52.960.7967.21
Fathomnet51.530.8659.98
BDD100k46.610.7860.13
DROID45.580.7660.35
YT-Temporal-1B42.070.7258.36
Ego4D38.640.6262.56
SA-V38.060.6657.62
GeoDE44.360.6766.05

Visualization

View examples from the dataset:
# See the example notebook
jupyter notebook examples/saco_gold_silver_vis_example.ipynb

Offline Evaluation

If you have predictions in COCO result format:
# Evaluate all subsets
jupyter notebook examples/saco_gold_silver_eval_example.ipynb

# Or evaluate a single subset
python scripts/eval/standalone_cgf1.py \
  --pred_file /path/to/coco_predictions_segm.json \
  --gt_files /path/to/annotations/silver_bdd100k_merged_test.json

Next Steps

Run Evaluations

Learn how to evaluate SAM 3 on SA-Co/Silver

SA-Co/VEval

Explore the video benchmark

Build docs developers (and LLMs) love