Skip to main content
SAM 3 uses Hydra for configuration management, with YAML files defining all training parameters. This page documents the configuration structure and key options.

Configuration Structure

Training configs are located in sam3/train/configs/ and follow this hierarchy:
configs/
├── eval_base.yaml              # Base configuration
├── roboflow_v100/
│   └── roboflow_v100_full_ft_100_images.yaml
├── gold_image_evals/
└── silver_image_evals/

Basic Configuration Example

Here’s a minimal training configuration:
# @package _global_
defaults:
  - _self_

paths:
  dataset_root: /path/to/dataset
  experiment_log_dir: /path/to/experiments
  bpe_path: /path/to/bpe_simple_vocab_16e6.txt.gz

launcher:
  num_nodes: 1
  gpus_per_node: 4
  experiment_log_dir: ${paths.experiment_log_dir}

submitit:
  use_cluster: False
  timeout_hour: 72
  cpus_per_task: 10
  port_range: [10000, 65000]

trainer:
  _target_: sam3.train.trainer.Trainer
  max_epochs: 20
  mode: train
  accelerator: cuda
  seed_value: 123

Configuration Sections

Paths

Define dataset and output paths:
paths:
  # Dataset location
  dataset_root: /path/to/dataset
  
  # Experiment outputs (logs, checkpoints)
  experiment_log_dir: /path/to/experiments/run_001
  
  # BPE tokenizer for text encoding
  bpe_path: /path/to/bpe_simple_vocab_16e6.txt.gz
  
  # Pretrained checkpoint (optional)
  checkpoint_path: null  # Auto-downloads if null

Launcher

Configure distributed training resources:
launcher:
  num_nodes: 1           # Number of compute nodes
  gpus_per_node: 4       # GPUs per node
  experiment_log_dir: ${paths.experiment_log_dir}
  multiprocessing_context: forkserver

Submitit (SLURM)

SLURM cluster configuration:
submitit:
  use_cluster: True      # True for SLURM, False for local
  account: null          # SLURM account
  partition: gpu         # SLURM partition
  qos: null             # Quality of Service
  timeout_hour: 72      # Job timeout
  cpus_per_task: 10     # CPUs per task
  port_range: [10000, 65000]  # Distributed training ports
  constraint: null      # Node constraints
  mem_gb: 128          # Memory per node (optional)

Trainer

Main training parameters:
trainer:
  _target_: sam3.train.trainer.Trainer
  
  # Training duration
  max_epochs: 20
  
  # Mode: train, val, or train_only
  mode: train
  
  # Hardware
  accelerator: cuda
  
  # Reproducibility
  seed_value: 123
  
  # Validation frequency
  val_epoch_freq: 10
  
  # Skip first validation
  skip_first_val: True
  
  # Checkpoint management
  skip_saving_ckpts: false
  
  # Memory management
  empty_gpu_mem_cache_after_eval: True
  
  # Gradient accumulation
  gradient_accumulation_steps: 1

Model

Model architecture configuration:
trainer:
  model:
    _target_: sam3.model_builder.build_sam3_image_model
    bpe_path: ${paths.bpe_path}
    device: cpus  # Load on CPU first, move to GPU later
    eval_mode: false
    enable_segmentation: True  # Enable mask prediction
    checkpoint_path: ${paths.checkpoint_path}

Data

Dataset and dataloader configuration:
trainer:
  data:
    train:
      _target_: sam3.train.data.torch_dataset.TorchDataset
      dataset:
        _target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
        img_folder: ${paths.dataset_root}/train/
        ann_file: ${paths.dataset_root}/train/_annotations.coco.json
        transforms: ${scratch.train_transforms}
        load_segmentation: ${scratch.enable_segmentation}
        max_ann_per_img: 500000
        training: true
        limit_ids: 100  # Limit to N images (null for all)
      
      shuffle: True
      batch_size: ${scratch.train_batch_size}
      num_workers: ${scratch.num_train_workers}
      pin_memory: True
      drop_last: True
      collate_fn: ${scratch.collate_fn}
    
    val:
      _target_: sam3.train.data.torch_dataset.TorchDataset
      dataset:
        _target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
        img_folder: ${paths.dataset_root}/test/
        ann_file: ${paths.dataset_root}/test/_annotations.coco.json
        transforms: ${scratch.val_transforms}
        training: false
      
      shuffle: False
      batch_size: ${scratch.val_batch_size}
      num_workers: ${scratch.num_val_workers}

Transforms

Data augmentation pipeline:
scratch:
  train_transforms:
    - _target_: sam3.train.transforms.basic_for_api.ComposeAPI
      transforms:
        # Random resize with scale range
        - _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
          sizes:
            _target_: sam3.train.transforms.basic.get_random_resize_scales
            size: ${scratch.resolution}
            min_size: 480
            rounded: false
          max_size:
            _target_: sam3.train.transforms.basic.get_random_resize_max_size
            size: ${scratch.resolution}
          square: true
        
        # Pad to fixed size
        - _target_: sam3.train.transforms.basic_for_api.PadToSizeAPI
          size: ${scratch.resolution}
        
        # Convert to tensor
        - _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
        
        # Normalize
        - _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
          mean: [0.5, 0.5, 0.5]
          std: [0.5, 0.5, 0.5]

Optimizer

Optimizer and scheduler configuration:
trainer:
  optim:
    # Mixed precision training
    amp:
      enabled: True
      amp_dtype: bfloat16  # bfloat16 or float16
    
    # Optimizer
    optimizer:
      _target_: torch.optim.AdamW
    
    # Gradient clipping
    gradient_clip:
      _target_: sam3.train.optim.optimizer.GradientClipper
      max_norm: 0.1
      norm_type: 2
    
    # Learning rate schedules
    options:
      lr:
        # Transformer learning rate
        - scheduler:
            _target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
            base_lr: 0.00008  # 8e-5
            timescale: 20
            warmup_steps: 20
            cooldown_steps: 20
        
        # Vision backbone learning rate
        - scheduler:
            _target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
            base_lr: 0.000025  # 2.5e-5
            timescale: 20
            warmup_steps: 20
            cooldown_steps: 20
          param_names:
            - 'backbone.vision_backbone.*'
        
        # Language backbone learning rate
        - scheduler:
            _target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
            base_lr: 0.000005  # 5e-6
            timescale: 20
            warmup_steps: 20
            cooldown_steps: 20
          param_names:
            - 'backbone.language_backbone.*'
      
      # Weight decay
      weight_decay:
        - scheduler:
            _target_: fvcore.common.param_scheduler.ConstantParamScheduler
            value: 0.1
        - scheduler:
            _target_: fvcore.common.param_scheduler.ConstantParamScheduler
            value: 0.0
          param_names:
            - '*bias*'
          module_cls_names: ['torch.nn.LayerNorm']

Loss Functions

Loss configuration for detection and segmentation:
trainer:
  loss:
    all:
      _target_: sam3.train.loss.sam3_loss.Sam3LossWrapper
      
      # Matching strategy
      matcher:
        _target_: sam3.train.matcher.BinaryHungarianMatcherV2
        focal: true
        cost_class: 2.0
        cost_bbox: 5.0
        cost_giou: 2.0
        alpha: 0.25
        gamma: 2
      
      # One-to-many matching
      o2m_weight: 2.0
      o2m_matcher:
        _target_: sam3.train.matcher.BinaryOneToManyMatcher
        alpha: 0.3
        threshold: 0.4
        topk: 4
      
      # Detection losses
      loss_fns_find:
        - _target_: sam3.train.loss.loss_fns.Boxes
          weight_dict:
            loss_bbox: 5.0
            loss_giou: 2.0
        
        - _target_: sam3.train.loss.loss_fns.IABCEMdetr
          weight_dict:
            loss_ce: 20.0
            presence_loss: 20.0
          pos_weight: 10.0
          alpha: 0.25
          gamma: 2
      
      # Segmentation loss (optional)
      # loss_fns_find:
      #   - _target_: sam3.train.loss.loss_fns.Masks
      #     weight_dict:
      #       loss_mask: 200.0
      #       loss_dice: 10.0

Checkpoint

Checkpoint saving configuration:
trainer:
  checkpoint:
    save_dir: ${launcher.experiment_log_dir}/checkpoints
    save_freq: 0  # 0 = only save last checkpoint
    save_list: [5, 10, 15]  # Also save at specific epochs
    
    # Resume from checkpoint
    resume_from: null  # Path to checkpoint.pt
    
    # Model initialization
    model_weight_initializer: null

Logging

Logging and monitoring:
trainer:
  logging:
    log_dir: ${launcher.experiment_log_dir}/logs/
    log_freq: 10  # Log every N iterations
    
    # TensorBoard
    tensorboard_writer:
      _target_: sam3.train.utils.logger.make_tensorboard_logger
      log_dir: ${launcher.experiment_log_dir}/tensorboard
      flush_secs: 120
      should_log: True
    
    # Weights & Biases (optional)
    wandb_writer: null
    
    log_level_primary: INFO
    log_level_secondary: ERROR

Distributed Training

Distributed training settings:
trainer:
  distributed:
    backend: nccl  # nccl for GPU, gloo for CPU
    find_unused_parameters: True
    gradient_as_bucket_view: True
    static_graph: False
    comms_dtype: null  # bfloat16, float16, or null
    timeout_mins: 30

CUDA Settings

CUDA optimization options:
trainer:
  cuda:
    cudnn_deterministic: false
    cudnn_benchmark: true
    allow_tf32: false
    matmul_allow_tf32: null  # Override for matmul
    cudnn_allow_tf32: null   # Override for cudnn

Scratch Parameters

Common training hyperparameters in the scratch section:
scratch:
  # Model
  enable_segmentation: True
  d_model: 256
  
  # Image processing
  resolution: 1008
  max_ann_per_img: 200
  
  # Normalization
  train_norm_mean: [0.5, 0.5, 0.5]
  train_norm_std: [0.5, 0.5, 0.5]
  
  # Batch size
  train_batch_size: 1
  val_batch_size: 1
  gradient_accumulation_steps: 1
  
  # Workers
  num_train_workers: 10
  num_val_workers: 4
  
  # Learning rates
  lr_scale: 0.1
  lr_transformer: 0.00008
  lr_vision_backbone: 0.000025
  lr_language_backbone: 0.000005
  lrd_vision_backbone: 0.9  # Layer decay
  wd: 0.1  # Weight decay
  
  # Scheduler
  scheduler_timescale: 20
  scheduler_warmup: 20
  scheduler_cooldown: 20
Always validate your configuration before starting training:
python -m sam3.train.train -c configs/your_config.yaml --use-cluster 0
Check the printed config for any errors or unexpected values.

Configuration Tips

For Small Datasets

  • Reduce learning rate: lr_scale: 0.01
  • More epochs: max_epochs: 50
  • Frequent validation: val_epoch_freq: 5

For Large Datasets

  • Standard learning rate: lr_scale: 0.1
  • Fewer epochs: max_epochs: 20
  • Less frequent validation: val_epoch_freq: 10

For Memory Constraints

  • Smaller resolution: resolution: 512
  • Gradient accumulation: gradient_accumulation_steps: 4
  • Reduce workers: num_train_workers: 2

For Speed

  • Disable segmentation: enable_segmentation: False
  • Larger batch size: train_batch_size: 2
  • More workers: num_train_workers: 16

Next Steps

Local Training

Run training with your configuration

Cluster Training

Scale to SLURM clusters

Build docs developers (and LLMs) love