Overview
The SAM 3 video API uses a request-response pattern for all operations. This page documents the request and response formats for each operation type.Request Structure
All requests are Python dictionaries with atype field:
Session Management
start_session
Start a new inference session on a video or image. Request:reset_session
Reset session to its initial state (removes all prompts and results). Request:close_session
Close and clean up a session (frees GPU memory). Request:Prompting
add_prompt
Add text, point, or box prompt on a specific video frame. Request:remove_object
Remove an object from tracking. Request:Propagation
propagate_in_video
Propagate prompts to get segmentation results across video frames. Request:Coordinate Systems
Points
Points are in pixel coordinates (x, y):- x: horizontal position (0 to image_width)
- y: vertical position (0 to image_height)
Bounding Boxes
Boxes in requests use normalized center-width-height format:- center_x: horizontal center (0.0 to 1.0)
- center_y: vertical center (0.0 to 1.0)
- width: box width (0.0 to 1.0)
- height: box height (0.0 to 1.0)
- [x0, y0, x1, y1]: top-left and bottom-right corners in pixels
Label Conventions
Point Labels
1: Foreground point (include this region)0: Background point (exclude this region)
Box Labels
1: Positive box (include objects in this box)0: Negative box (exclude objects in this box)
Error Handling
Invalid requests raiseRuntimeError:
- Session not found: Invalid or expired
session_id - Invalid frame index:
frame_indexout of range - Missing prompts: Propagation before adding any prompts