Sam3Processor

Overview

The Sam3Processor class provides a high-level interface for using SAM 3 on images with text and geometric prompts. It handles image preprocessing, prompt encoding, and result post-processing.

Class Initialization

from sam3.model.sam3_image_processor import Sam3Processor

processor = Sam3Processor(
    model,
    resolution=1008,
    device="cuda",
    confidence_threshold=0.5
)

Parameters

model

Sam3Image

required

The SAM 3 image model instance.

resolution

int

default:"1008"

Input image resolution (images are resized to resolution × resolution).

device

str

default:"'cuda'"

Device to run inference on.

confidence_threshold

float

default:"0.5"

Confidence threshold for filtering predictions.

Methods

set_image

Sets the image for inference and computes image embeddings.

state = processor.set_image(image, state=None)

image

PIL.Image.Image | torch.Tensor | np.ndarray

required

Input image in RGB format. Can be PIL Image, PyTorch tensor, or NumPy array.

state

dict | None

default:"None"

Optional state dictionary. If None, creates a new state.

state

dict

Updated state containing image embeddings and metadata:

original_height: Original image height
original_width: Original image width
backbone_out: Backbone feature maps

set_image_batch

Sets a batch of images for inference.

state = processor.set_image_batch(images, state=None)

images

list[PIL.Image.Image]

required

List of PIL images to process.

state

dict

State containing:

original_heights: List of original heights
original_widths: List of original widths
backbone_out: Batch backbone features

set_text_prompt

Sets text prompt and runs inference.

state = processor.set_text_prompt(prompt, state)

prompt

str

required

Text description of objects to segment (e.g., “person”, “dog”).

state

dict

required

State dictionary from set_image(). Must contain image embeddings.

state

dict

Updated state with segmentation results:

masks: Binary masks (bool tensor)
masks_logits: Mask logits (float tensor)
boxes: Bounding boxes in [x0, y0, x1, y1] format
scores: Confidence scores

add_geometric_prompt

Adds a box prompt and runs inference.

state = processor.add_geometric_prompt(box, label, state)

box

list[float]

required

Box in [center_x, center_y, width, height] format, normalized to [0, 1].

label

bool

required

True for positive box (include), False for negative box (exclude).

state

dict

required

State dictionary with image embeddings.

state

dict

Updated state with new segmentation results.

reset_all_prompts

Removes all prompts and results from the state.

processor.reset_all_prompts(state)

set_confidence_threshold

Updates the confidence threshold and re-filters results.

state = processor.set_confidence_threshold(threshold, state=None)

threshold

float

required

New confidence threshold (0.0 to 1.0).

Example Usage

Basic Text Prompting

from PIL import Image
from sam3.model_builder import build_sam3_image_model
from sam3.model.sam3_image_processor import Sam3Processor

# Load model and create processor
model = build_sam3_image_model()
processor = Sam3Processor(model)

# Load image
image = Image.open("image.jpg")

# Set image and text prompt
state = processor.set_image(image)
state = processor.set_text_prompt("person", state)

# Access results
masks = state["masks"]  # Binary masks
boxes = state["boxes"]  # Bounding boxes
scores = state["scores"]  # Confidence scores

Adding Box Prompts

# Set image
state = processor.set_image(image)

# Add positive box (normalized coordinates)
box = [0.5, 0.5, 0.3, 0.4]  # center_x, center_y, width, height
state = processor.add_geometric_prompt(box, label=True, state=state)

# Add negative box to exclude region
exclude_box = [0.7, 0.3, 0.2, 0.2]
state = processor.add_geometric_prompt(exclude_box, label=False, state=state)

Adjusting Confidence Threshold

# Initial inference
state = processor.set_image(image)
state = processor.set_text_prompt("car", state)

print(f"Found {len(state['scores'])} masks")

# Increase threshold to get fewer, higher-confidence results
state = processor.set_confidence_threshold(0.8, state)
print(f"After filtering: {len(state['scores'])} masks")

Batch Processing

# Load multiple images
images = [Image.open(f"image_{i}.jpg") for i in range(5)]

# Process batch
state = processor.set_image_batch(images)
state = processor.set_text_prompt("dog", state)

# Results contain predictions for all images

State Dictionary Structure

The state dictionary contains:

original_height / original_heights: Original image dimensions
original_width / original_widths: Original image dimensions
backbone_out: Cached backbone features
geometric_prompt: Current geometric prompts
masks: Binary segmentation masks (H, W)
masks_logits: Mask logits before thresholding
boxes: Bounding boxes in [x0, y0, x1, y1] format
scores: Confidence scores for each prediction

Notes

Call set_image() before adding any prompts
Text prompts work best with simple noun phrases
Box coordinates are normalized to [0, 1] range
Geometric prompts are accumulated (multiple boxes/points)
Use reset_all_prompts() to start fresh

Model Builders

Image Inference

Video Inference

Agent

Evaluation

Sam3Processor

Overview

Class Initialization

Parameters

Methods

set_image

set_image_batch

set_text_prompt

add_geometric_prompt

reset_all_prompts

set_confidence_threshold

Example Usage

Basic Text Prompting

Adding Box Prompts

Adjusting Confidence Threshold

Batch Processing

State Dictionary Structure

Notes

Build docs developers (and LLMs) love

Model Builders

Image Inference

Video Inference

Agent

Evaluation

​Overview

​Class Initialization

​Parameters

​Methods

​set_image

​set_image_batch

​set_text_prompt

​add_geometric_prompt

​reset_all_prompts

​set_confidence_threshold

​Example Usage

​Basic Text Prompting

​Adding Box Prompts

​Adjusting Confidence Threshold

​Batch Processing

​State Dictionary Structure

​Notes

Build docs developers (and LLMs) love

Overview

Class Initialization

Parameters

Methods

set_image

set_image_batch

set_text_prompt

add_geometric_prompt

reset_all_prompts

set_confidence_threshold

Example Usage

Basic Text Prompting

Adding Box Prompts

Adjusting Confidence Threshold

Batch Processing

State Dictionary Structure

Notes