Skip to main content
TrainConfig holds every hyperparameter and runtime option for an RF-DETR training run. You do not need to instantiate it directly — pass parameters as keyword arguments to model.train() and RF-DETR builds the config for you.
model.train(
    dataset_dir="dataset/",
    epochs=50,
    batch_size=8,
    lr=1e-4,
    output_dir="runs/experiment-1",
)
If you prefer to build a config object explicitly:
from rfdetr.config import TrainConfig

config = TrainConfig(
    dataset_dir="dataset/",
    epochs=50,
    batch_size=8,
)
The following fields are deprecated and will be removed in v1.9. They still work today but emit a DeprecationWarning:
  • group_detr — set on ModelConfig instead
  • ia_bce_loss — set on ModelConfig instead
  • segmentation_head — set on ModelConfig instead
  • num_select — set on ModelConfig instead

Core

dataset_dir
str
required
Path to the dataset directory. RF-DETR auto-detects the format:
  • COCO — directory must contain train/_annotations.coco.json
  • YOLO — directory must contain a data.yaml or data.yml file
See dataset formats for details.
output_dir
str
default:"\"output\""
Directory where checkpoints, TensorBoard logs, and training_config.json are written. Created automatically if it does not exist.
epochs
int
default:"100"
Total number of training epochs.
resume
str
Path to a checkpoint file to resume training from. PyTorch Lightning restores optimizer state, epoch count, and learning rate schedule automatically.
dataset_file
"coco" | "o365" | "roboflow" | "yolo"
default:"\"roboflow\""
Dataset loader to use. "roboflow" and "coco" both read COCO-format annotations; "roboflow" adds Roboflow-specific augmentation defaults. Use "yolo" for YOLO-format datasets.
seed
int
Random seed for reproducibility. When None (the default), no seed is set.
checkpoint_interval
int
default:"10"
Save a checkpoint every N epochs. Must be ≥ 1.
early_stopping
boolean
default:"false"
Stop training when the monitored metric (mAP) stops improving.
early_stopping_patience
int
default:"10"
Number of epochs with no improvement before training is stopped. Only used when early_stopping=True.
early_stopping_min_delta
float
default:"0.001"
Minimum change in the monitored metric to qualify as an improvement.
early_stopping_use_ema
boolean
default:"false"
Use EMA weights to evaluate the early-stopping metric.

Batch and memory

batch_size
int | "auto"
default:"4"
Per-device micro-batch size. Set to "auto" to let RF-DETR probe available GPU memory and choose the largest safe value automatically.
grad_accum_steps
int
default:"4"
Number of micro-batches to accumulate before an optimizer step. The effective batch size per device is batch_size × grad_accum_steps. Must be ≥ 1.
auto_batch_target_effective
int
default:"16"
Target per-device effective batch size when batch_size="auto". RF-DETR probes memory to find the largest batch_size that fits and sets grad_accum_steps accordingly.
auto_batch_max_targets_per_image
int
default:"100"
Worst-case target count assumed per image during the auto-batch memory probe.
auto_batch_ema_headroom
float
default:"0.7"
When use_ema=True, the probed safe batch size is multiplied by this factor to leave memory headroom for the EMA copy. Must be in (0, 1].
num_workers
int
default:"2"
Number of data-loader worker processes per device.
pin_memory
boolean
Pin data-loader memory for faster host-to-device transfers. When None, PyTorch Lightning applies its default heuristic.
persistent_workers
boolean
Keep data-loader workers alive between epochs. When None, PyTorch Lightning applies its default.
prefetch_factor
int
Number of batches to prefetch per data-loader worker. Must be ≥ 1 when set.

Learning rate

lr
float
default:"1e-4"
Base learning rate applied to decoder and projection head parameters.
lr_encoder
float
default:"1.5e-4"
Learning rate for the backbone encoder.
lr_scheduler
"step" | "cosine"
default:"\"step\""
Learning rate schedule. "step" drops the LR at lr_drop epoch; "cosine" decays to lr × lr_min_factor over the full training run.
lr_drop
int
default:"100"
Epoch at which the step scheduler drops the learning rate. Only used when lr_scheduler="step".
lr_min_factor
float
default:"0.0"
Minimum LR as a fraction of the initial LR when using the cosine scheduler.
lr_vit_layer_decay
float
default:"0.8"
Per-layer learning rate decay factor applied to ViT backbone layers (layer-wise LR decay).
lr_component_decay
float
default:"0.7"
Learning rate decay factor applied across model components.
warmup_epochs
float
default:"0.0"
Number of epochs for linear learning rate warm-up at the start of training.
weight_decay
float
default:"1e-4"
L2 weight decay applied to all non-bias and non-norm parameters.
clip_max_norm
float
default:"0.1"
Gradient clipping max norm. Set to 0 to disable.
drop_path
float
default:"0.0"
Stochastic depth drop-path rate applied to transformer blocks.

EMA

use_ema
boolean
default:"true"
Maintain an Exponential Moving Average (EMA) of model weights. The EMA checkpoint (checkpoint_best_ema.pth) is typically the best checkpoint for deployment.
ema_decay
float
default:"0.993"
EMA decay factor. Higher values give the EMA more inertia.
ema_tau
int
default:"100"
EMA warm-up steps. The effective decay is ramped up over the first ema_tau updates.
ema_update_interval
int
default:"1"
Update the EMA every N optimizer steps. Must be ≥ 1.

Data augmentation

multi_scale
boolean
default:"true"
Enable multi-scale training. Images are randomly resized during training to improve generalisation across object sizes.
expanded_scales
boolean
default:"true"
Use an expanded set of resize scales during multi-scale training.
square_resize_div_64
boolean
default:"true"
Resize images to square dimensions divisible by 64 before feeding them to the model.
do_random_resize_via_padding
boolean
default:"false"
When enabled, random resize is implemented via padding rather than stretching.
aug_config
object
Advanced augmentation configuration dictionary. See augmentations for the full schema.

Logging

tensorboard
boolean
default:"true"
Log training metrics to TensorBoard. Logs are written to output_dir/.
wandb
boolean
default:"false"
Log training metrics to Weights & Biases. Requires wandb to be installed and authenticated.
mlflow
boolean
default:"false"
Log training metrics to MLflow. Requires mlflow to be installed.
project
str
Project name for W&B or MLflow runs.
run
str
Run name for W&B or MLflow runs.
log_per_class_metrics
boolean
default:"true"
Log per-class precision, recall, and AP metrics during validation.
progress_bar
"tqdm" | "rich"
Progress bar style during training. Set to "rich" for a richer terminal display, "tqdm" for a standard bar, or omit to disable. Passing True is treated as "tqdm" (legacy behaviour).
See loggers for setup guides for each logger.

Multi-GPU

devices
int | str
default:"1"
Number of GPUs (or a device specification string) to use for training. Maps to the PyTorch Lightning Trainer(devices=...) argument.
strategy
str
default:"\"auto\""
PTL distributed training strategy. Common values: "auto", "ddp", "ddp_spawn", "fsdp", "deepspeed". Invalid values surface as PyTorch Lightning errors.
num_nodes
int
default:"1"
Number of machines for multi-node training. Maps to Trainer(num_nodes=...). Leave at 1 for single-machine training.
accelerator
str
default:"\"auto\""
PTL accelerator type. Typically set automatically from the device argument; override only when needed.
sync_bn
boolean
default:"false"
Convert batch norm layers to SyncBatchNorm for multi-GPU training.

Advanced

cls_loss_coef
float
default:"1.0"
Weight applied to the classification loss component.
eval_interval
int
default:"1"
Run validation every N epochs. Must be ≥ 1.
eval_max_dets
int
default:"500"
Maximum number of detections per image considered during COCO evaluation.
compute_val_loss
boolean
default:"true"
Compute and log validation loss at the end of each epoch.
compute_test_loss
boolean
default:"true"
Compute and log test loss at the end of training.
run_test
boolean
default:"false"
Run a full evaluation pass on the test split after training completes.
fp16_eval
boolean
default:"false"
Use 16-bit floating point precision during validation to reduce memory usage.
dont_save_weights
boolean
default:"false"
Skip saving checkpoint files during training. Useful for quick debugging runs.
class_names
list[str]
Explicit list of class names. When set, these names are embedded in the checkpoint and used in place of names inferred from the dataset.
train_log_sync_dist
boolean
default:"false"
Synchronise training metrics across distributed processes before logging.
train_log_on_step
boolean
default:"false"
Log training metrics at every step in addition to epoch-level summaries.

Training overview

End-to-end guide to training RF-DETR on a custom dataset.

Training parameters guide

Practical guidance on choosing batch size, learning rate, and schedule.

SegmentationTrainConfig

Extended config for segmentation model training.

Loggers

Set up TensorBoard, W&B, and MLflow logging.

Build docs developers (and LLMs) love