Learn how to run SAM 3 inference on images with text and visual prompts
SAM 3 enables powerful image segmentation using both natural language text prompts and visual prompts like bounding boxes. This guide covers the basics of running inference on images.
import torch# Turn on tfloat32 for Ampere GPUstorch.backends.cuda.matmul.allow_tf32 = Truetorch.backends.cudnn.allow_tf32 = True# Use bfloat16 for the entire notebooktorch.autocast("cuda", dtype=torch.bfloat16).__enter__()
Use a bounding box to specify which object to segment:
# Box in (x,y,w,h) format, where (x,y) is the top left cornerbox_input_xywh = torch.tensor([480.0, 290.0, 110.0, 360.0]).view(-1, 4)box_input_cxcywh = box_xywh_to_cxcywh(box_input_xywh)norm_box_cxcywh = normalize_bbox(box_input_cxcywh, width, height).flatten().tolist()print("Normalized box input:", norm_box_cxcywh)processor.reset_all_prompts(inference_state)inference_state = processor.add_geometric_prompt( state=inference_state, box=norm_box_cxcywh, label=True)plot_results(img0, inference_state)