Skip to main content
This guide walks through setting up the environment and dependencies for training SAM 3 models.

Prerequisites

  • Python 3.9 or later
  • CUDA 11.8 or later (for GPU training)
  • 16GB+ VRAM per GPU
  • Linux operating system (recommended)

Installation

1

Install SAM 3

First, install the SAM 3 package from source:
git clone https://github.com/facebookresearch/sam3.git
cd sam3
pip install -e .
This installs SAM 3 in editable mode with all core dependencies.
2

Install Training Dependencies

Install additional packages required for training:
pip install hydra-core submitit fvcore iopath tensorboard
Optional dependencies:
# For Weights & Biases logging
pip install wandb

# For COCO evaluation
pip install pycocotools
3

Verify PyTorch Installation

Ensure PyTorch is installed with CUDA support:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
Expected output:
PyTorch: 2.1.0+cu118
CUDA: True
If CUDA is not available, reinstall PyTorch:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
4

Download Assets

Download the BPE tokenizer file required for text encoding:
mkdir -p sam3/assets
cd sam3/assets
wget https://huggingface.co/facebook/sam3/resolve/main/bpe_simple_vocab_16e6.txt.gz
Note the path to this file - you’ll need it in your training config.

Directory Structure

Set up your training workspace:
workspace/
├── sam3/                      # SAM 3 repository
│   ├── train/                 # Training code
│   │   ├── configs/          # Configuration files
│   │   └── train.py          # Main training script
│   └── assets/               # Model assets
│       └── bpe_simple_vocab_16e6.txt.gz
├── datasets/                  # Your datasets
│   └── my_dataset/
│       ├── train/
│       └── test/
└── experiments/               # Training outputs
    └── logs/

Prepare Your Dataset

1

Format Annotations

Ensure your dataset uses COCO JSON format:
datasets/my_dataset/
├── train/
   ├── image_001.jpg
   ├── image_002.jpg
   └── _annotations.coco.json
└── test/
    ├── image_001.jpg
    └── _annotations.coco.json
The annotations file should contain:
  • images: List of image metadata
  • annotations: Bounding boxes and optional masks
  • categories: Object categories
2

Validate Dataset

Verify your annotations are correctly formatted:
import json
from pycocotools.coco import COCO

# Load and validate
coco = COCO('datasets/my_dataset/train/_annotations.coco.json')
print(f"Images: {len(coco.imgs)}")
print(f"Annotations: {len(coco.anns)}")
print(f"Categories: {len(coco.cats)}")
3

Prepare Segmentation Masks (Optional)

If training with segmentation, ensure annotations include mask data:
{
  "id": 1,
  "image_id": 1,
  "category_id": 1,
  "bbox": [100, 100, 200, 150],
  "segmentation": [[x1, y1, x2, y2, ...]],  // Polygon or RLE
  "area": 30000,
  "iscrowd": 0
}
Masks can be in polygon format or RLE (Run-Length Encoding).

Environment Configuration

Set Environment Variables

Create a .env file or export variables:
# CUDA settings
export CUDA_VISIBLE_DEVICES=0,1,2,3

# PyTorch settings
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

# Distributed training
export NCCL_DEBUG=INFO
export NCCL_IB_DISABLE=1  # If using InfiniBand

# Data paths
export DATASET_ROOT=/path/to/datasets
export EXPERIMENT_ROOT=/path/to/experiments

Configure Paths

Update your training config with local paths:
paths:
  # Dataset location
  dataset_root: /path/to/datasets/my_dataset
  
  # Where to save logs and checkpoints
  experiment_log_dir: /path/to/experiments/my_training
  
  # BPE tokenizer path
  bpe_path: /path/to/sam3/assets/bpe_simple_vocab_16e6.txt.gz
  
  # Pretrained checkpoint (optional)
  checkpoint_path: null  # Downloads from HuggingFace if null

GPU Setup

Single GPU

For single GPU training:
launcher:
  num_nodes: 1
  gpus_per_node: 1

submitit:
  use_cluster: False

Multiple GPUs (Single Node)

For multi-GPU training on one machine:
launcher:
  num_nodes: 1
  gpus_per_node: 4  # Number of GPUs

submitit:
  use_cluster: False

Cluster Setup

For SLURM cluster training, see Cluster Training.

Verify Installation

Test that everything is set up correctly:
# Check training script
python -m sam3.train.train --help

# Validate configuration
python -c "from hydra import compose, initialize_config_module; \
  initialize_config_module('sam3.train', version_base='1.2'); \
  cfg = compose(config_name='configs/eval_base.yaml'); \
  print('Config loaded successfully')"
Ensure you have sufficient disk space for:
  • Dataset storage (varies by dataset size)
  • Checkpoints (~5GB per checkpoint)
  • Logs and tensorboard files

Troubleshooting

CUDA Out of Memory

If you encounter OOM errors:
  1. Reduce batch size in config:
    scratch:
      train_batch_size: 1
    
  2. Reduce image resolution:
    scratch:
      resolution: 512  # Default is 1008
    
  3. Enable gradient accumulation:
    scratch:
      gradient_accumulation_steps: 4
    

Import Errors

If modules are not found:
# Add SAM 3 to PYTHONPATH
export PYTHONPATH=/path/to/sam3:$PYTHONPATH

Slow Data Loading

If data loading is slow:
scratch:
  num_train_workers: 8  # Increase workers
  num_val_workers: 4

Next Steps

Now that your environment is set up:

Configuration

Learn about training configuration options

Local Training

Start training on local GPUs

Build docs developers (and LLMs) love