Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

SAM 3: Segment Anything with Concepts

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Key Features

Open-Vocabulary Segmentation

Segment any object using natural language descriptions. SAM 3 can handle over 270K unique concepts with 75-80% of human performance.

Multi-Modal Prompting

Prompt with text, points, boxes, masks, or combinations thereof for precise segmentation control.

Video Tracking

Track and segment objects across video frames with temporal consistency and interactive refinement capabilities.

Unified Architecture

848M parameter model with a decoupled detector-tracker design that scales efficiently with data.

What’s New in SAM 3

Compared to its predecessor SAM 2, SAM 3 introduces:

Concept-based segmentation: Exhaustively segment all instances of an open-vocabulary concept specified by text or exemplars
Presence token: Improved discrimination between closely related prompts (e.g., “a player in white” vs. “a player in red”)
Massive concept coverage: Trained on 4+ million unique concepts, the largest high-quality open-vocabulary segmentation dataset
Decoupled architecture: Separate detector and tracker minimize task interference and improve performance

SAM 3 achieves state-of-the-art results on instance segmentation and box detection benchmarks including LVIS, COCO, and the new SA-Co dataset.

Performance Highlights

SAM 3 demonstrates exceptional performance across multiple benchmarks:

SA-Co/Gold (Instance Segmentation): 54.1 cgF1 (vs. 72.8 human performance)
LVIS (Instance Segmentation): 48.5 AP
COCO (Box Detection): 56.4 AP
SA-V Video Test: 58.0 pHOTA

Common Use Cases

Image Segmentation

Segment objects in images using text descriptions or visual prompts for content analysis and editing.

Video Object Tracking

Track specific objects across video frames for surveillance, sports analysis, or content creation.

Interactive Annotation

Create high-quality annotations with point and box prompts for dataset creation.

Visual Search

Find all instances of specific concepts in large image or video collections.

Get Started

Installation

Install SAM 3 and set up your environment

Quick Start

Run your first segmentation in minutes

Guides

Explore guides for image and video inference

Architecture Overview

SAM 3 consists of three main components:

Shared Vision Encoder: Extracts visual features from images or video frames
Detector: DETR-based model conditioned on text, geometry, and image exemplars
Tracker: Inherits SAM 2 transformer encoder-decoder architecture for video segmentation

The decoupled design allows each component to specialize in its task while sharing a common visual representation.

Next Steps

Ready to get started? Follow our installation guide to set up SAM 3, then try the quick start tutorial to run your first segmentation.

Installation Guide

Install SAM 3 and configure your environment

Installation

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Guides

Training & Fine-tuning

Evaluation

Introduction to SAM 3

SAM 3: Segment Anything with Concepts

Key Features

Open-Vocabulary Segmentation

Multi-Modal Prompting

Video Tracking

Unified Architecture

What’s New in SAM 3

Performance Highlights

Common Use Cases

Image Segmentation

Video Object Tracking

Interactive Annotation

Visual Search

Get Started

Installation

Quick Start

Guides

Architecture Overview

Next Steps

Installation Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Training & Fine-tuning

Evaluation

​SAM 3: Segment Anything with Concepts

​Key Features

Open-Vocabulary Segmentation

Multi-Modal Prompting

Video Tracking

Unified Architecture

​What’s New in SAM 3

​Performance Highlights

​Common Use Cases

Image Segmentation

Video Object Tracking

Interactive Annotation

Visual Search

​Get Started

Installation

Quick Start

Guides

​Architecture Overview

​Next Steps

Installation Guide

Build docs developers (and LLMs) love

SAM 3: Segment Anything with Concepts

Key Features

What’s New in SAM 3

Performance Highlights

Common Use Cases

Get Started

Architecture Overview

Next Steps