TrainConfig holds every hyperparameter and runtime option for an RF-DETR training run. You do not need to instantiate it directly — pass parameters as keyword arguments to model.train() and RF-DETR builds the config for you.
The following fields are deprecated and will be removed in v1.9. They still work today but emit a
DeprecationWarning:group_detr— set onModelConfiginsteadia_bce_loss— set onModelConfiginsteadsegmentation_head— set onModelConfiginsteadnum_select— set onModelConfiginstead
Core
Path to the dataset directory. RF-DETR auto-detects the format:
- COCO — directory must contain
train/_annotations.coco.json - YOLO — directory must contain a
data.yamlordata.ymlfile
Directory where checkpoints, TensorBoard logs, and
training_config.json are written. Created automatically if it does not exist.Total number of training epochs.
Path to a checkpoint file to resume training from. PyTorch Lightning restores optimizer state, epoch count, and learning rate schedule automatically.
Dataset loader to use.
"roboflow" and "coco" both read COCO-format annotations; "roboflow" adds Roboflow-specific augmentation defaults. Use "yolo" for YOLO-format datasets.Random seed for reproducibility. When
None (the default), no seed is set.Save a checkpoint every N epochs. Must be ≥ 1.
Stop training when the monitored metric (mAP) stops improving.
Number of epochs with no improvement before training is stopped. Only used when
early_stopping=True.Minimum change in the monitored metric to qualify as an improvement.
Use EMA weights to evaluate the early-stopping metric.
Batch and memory
Per-device micro-batch size. Set to
"auto" to let RF-DETR probe available GPU memory and choose the largest safe value automatically.Number of micro-batches to accumulate before an optimizer step. The effective batch size per device is
batch_size × grad_accum_steps. Must be ≥ 1.Target per-device effective batch size when
batch_size="auto". RF-DETR probes memory to find the largest batch_size that fits and sets grad_accum_steps accordingly.Worst-case target count assumed per image during the auto-batch memory probe.
When
use_ema=True, the probed safe batch size is multiplied by this factor to leave memory headroom for the EMA copy. Must be in (0, 1].Number of data-loader worker processes per device.
Pin data-loader memory for faster host-to-device transfers. When
None, PyTorch Lightning applies its default heuristic.Keep data-loader workers alive between epochs. When
None, PyTorch Lightning applies its default.Number of batches to prefetch per data-loader worker. Must be ≥ 1 when set.
Learning rate
Base learning rate applied to decoder and projection head parameters.
Learning rate for the backbone encoder.
Learning rate schedule.
"step" drops the LR at lr_drop epoch; "cosine" decays to lr × lr_min_factor over the full training run.Epoch at which the step scheduler drops the learning rate. Only used when
lr_scheduler="step".Minimum LR as a fraction of the initial LR when using the cosine scheduler.
Per-layer learning rate decay factor applied to ViT backbone layers (layer-wise LR decay).
Learning rate decay factor applied across model components.
Number of epochs for linear learning rate warm-up at the start of training.
L2 weight decay applied to all non-bias and non-norm parameters.
Gradient clipping max norm. Set to
0 to disable.Stochastic depth drop-path rate applied to transformer blocks.
EMA
Maintain an Exponential Moving Average (EMA) of model weights. The EMA checkpoint (
checkpoint_best_ema.pth) is typically the best checkpoint for deployment.EMA decay factor. Higher values give the EMA more inertia.
EMA warm-up steps. The effective decay is ramped up over the first
ema_tau updates.Update the EMA every N optimizer steps. Must be ≥ 1.
Data augmentation
Enable multi-scale training. Images are randomly resized during training to improve generalisation across object sizes.
Use an expanded set of resize scales during multi-scale training.
Resize images to square dimensions divisible by 64 before feeding them to the model.
When enabled, random resize is implemented via padding rather than stretching.
Advanced augmentation configuration dictionary. See augmentations for the full schema.
Logging
Log training metrics to TensorBoard. Logs are written to
output_dir/.Log training metrics to Weights & Biases. Requires
wandb to be installed and authenticated.Log training metrics to MLflow. Requires
mlflow to be installed.Project name for W&B or MLflow runs.
Run name for W&B or MLflow runs.
Log per-class precision, recall, and AP metrics during validation.
Progress bar style during training. Set to
"rich" for a richer terminal display, "tqdm" for a standard bar, or omit to disable. Passing True is treated as "tqdm" (legacy behaviour).Multi-GPU
Number of GPUs (or a device specification string) to use for training. Maps to the PyTorch Lightning
Trainer(devices=...) argument.PTL distributed training strategy. Common values:
"auto", "ddp", "ddp_spawn", "fsdp", "deepspeed". Invalid values surface as PyTorch Lightning errors.Number of machines for multi-node training. Maps to
Trainer(num_nodes=...). Leave at 1 for single-machine training.PTL accelerator type. Typically set automatically from the
device argument; override only when needed.Convert batch norm layers to
SyncBatchNorm for multi-GPU training.Advanced
Weight applied to the classification loss component.
Run validation every N epochs. Must be ≥ 1.
Maximum number of detections per image considered during COCO evaluation.
Compute and log validation loss at the end of each epoch.
Compute and log test loss at the end of training.
Run a full evaluation pass on the test split after training completes.
Use 16-bit floating point precision during validation to reduce memory usage.
Skip saving checkpoint files during training. Useful for quick debugging runs.
Explicit list of class names. When set, these names are embedded in the checkpoint and used in place of names inferred from the dataset.
Synchronise training metrics across distributed processes before logging.
Log training metrics at every step in addition to epoch-level summaries.
Related
Training overview
End-to-end guide to training RF-DETR on a custom dataset.
Training parameters guide
Practical guidance on choosing batch size, learning rate, and schedule.
SegmentationTrainConfig
Extended config for segmentation model training.
Loggers
Set up TensorBoard, W&B, and MLflow logging.