RF-DETR supports integration with popular experiment tracking and visualization platforms. You can enable one or more loggers by passing boolean flags to model.train().
A CSV logger is always active regardless of any flags. It requires no extra packages and writes all metrics to {output_dir}/metrics.csv on every validation step.
Installation
TensorBoard, W&B, and MLflow all require the loggers extras:
pip install "rfdetr[loggers]"
Loggers
TensorBoard
Weights & Biases
MLflow
ClearML
TensorBoard is a toolkit for visualizing and tracking training metrics locally.TensorBoard logging is enabled by default. Pass tensorboard=False to disable it.If the tensorboard package is not installed, training continues without error — a UserWarning is emitted and TensorBoard logging is silently suppressed. Install rfdetr[loggers] to avoid this.
Usage
from rfdetr import RFDETRMedium
model = RFDETRMedium()
model.train(
dataset_dir="path/to/dataset",
epochs=100,
batch_size=4,
grad_accum_steps=4,
lr=1e-4,
output_dir="output",
# tensorboard=True is the default; pass tensorboard=False to disable
)
Viewing logs
Local environment:tensorboard --logdir output
Then open http://localhost:6006/ in your browser.Google Colab:%load_ext tensorboard
%tensorboard --logdir output
Weights & Biases (W&B) is a cloud-based platform for experiment tracking, visualization, and collaboration.W&B logging is disabled by default. Pass wandb=True to enable it.Setup
Install the required packages and log in:pip install "rfdetr[loggers]"
wandb login
You can retrieve your API key at wandb.ai/authorize.Usage
from rfdetr import RFDETRMedium
model = RFDETRMedium()
model.train(
dataset_dir="path/to/dataset",
epochs=100,
batch_size=4,
grad_accum_steps=4,
lr=1e-4,
output_dir="output",
wandb=True,
project="my-detection-project",
run="experiment-001",
)
Configuration
| Parameter | Description |
|---|
project | Groups related experiments together |
run | Identifies individual training sessions. If not specified, W&B assigns a random name. |
Access your runs at wandb.ai. W&B provides real-time metric visualization, experiment comparison, hyperparameter tracking, system metrics (GPU usage, memory), and training config logging. MLflow is an open-source platform for the machine learning lifecycle that helps track experiments, package code into reproducible runs, and share and deploy models.MLflow logging is disabled by default. Pass mlflow=True to enable it.Setup
pip install "rfdetr[loggers]"
Usage
from rfdetr import RFDETRMedium
model = RFDETRMedium()
model.train(
dataset_dir="path/to/dataset",
epochs=100,
batch_size=4,
grad_accum_steps=4,
lr=1e-4,
output_dir="output",
mlflow=True,
project="my-detection-project",
run="experiment-001",
)
Configuration
| Parameter | Description |
|---|
project | Sets the experiment name in MLflow |
run | Sets the run name (auto-generated if not specified) |
Custom tracking server
To use a custom MLflow tracking server, set environment variables before training:import os
os.environ["MLFLOW_TRACKING_URI"] = "https://your-mlflow-server.com"
os.environ["MLFLOW_TRACKING_TOKEN"] = "your-auth-token"
model = RFDETRMedium()
model.train(..., mlflow=True)
Viewing logs
Start the MLflow UI:mlflow ui --backend-store-uri <OUTPUT_PATH>
Then open http://localhost:5000 in your browser. ClearML is an open-source platform for managing, tracking, and automating machine learning experiments.ClearML is not yet integrated as a native PTL logger. Passing clearml=True to model.train() emits a UserWarning and has no other effect — metrics are not logged to ClearML.
Workaround: ClearML SDK auto-binding
ClearML’s SDK captures PyTorch Lightning metrics automatically when a Task is initialised before training begins:from clearml import Task
from rfdetr import RFDETRMedium
# Initialise before model.train() — ClearML auto-binds to PTL logging
task = Task.init(project_name="my-detection-project", task_name="experiment-001")
model = RFDETRMedium()
model.train(
dataset_dir="path/to/dataset",
epochs=100,
batch_size=4,
grad_accum_steps=4,
lr=1e-4,
output_dir="output",
# Do NOT pass clearml=True — it does nothing
)
Using multiple loggers
You can enable multiple logging systems simultaneously:
model.train(
dataset_dir="path/to/dataset",
epochs=100,
tensorboard=True,
wandb=True,
mlflow=True,
project="my-project",
run="experiment-001",
)
This lets you leverage the strengths of different platforms:
- TensorBoard: Local visualization and debugging
- W&B: Cloud-based collaboration and experiment comparison
- MLflow: Model registry and deployment tracking
clearml=True is accepted but has no effect in the current version. Use the ClearML SDK workaround shown above instead.
Attaching custom loggers
To attach a logger not supported by TrainConfig (for example Neptune or Comet), build it yourself and append it to trainer.loggers before calling trainer.fit:
from rfdetr.config import RFDETRMediumConfig, TrainConfig
from rfdetr.training import RFDETRModelModule, RFDETRDataModule, build_trainer
model_config = RFDETRMediumConfig(num_classes=10)
train_config = TrainConfig(
dataset_dir="path/to/dataset",
epochs=100,
output_dir="output",
tensorboard=True, # built-in loggers still work
)
module = RFDETRModelModule(model_config, train_config)
datamodule = RFDETRDataModule(model_config, train_config)
trainer = build_trainer(train_config, model_config)
# Attach any additional PTL-compatible logger
from pytorch_lightning.loggers import CSVLogger # example — use any PTL logger
trainer.loggers.append(CSVLogger(save_dir="output", name="extra"))
trainer.fit(module, datamodule)
Logged metrics reference
All active loggers receive the same set of metric keys:
| Key | When logged | Description |
|---|
train/loss | Every step / epoch | Total weighted training loss |
train/<term> | Every step / epoch | Individual loss terms (e.g. train/loss_bbox) |
val/loss | Each epoch | Validation loss (if compute_val_loss=True) |
val/mAP_50_95 | Each eval epoch | COCO box mAP@[.50:.05:.95] |
val/mAP_50 | Each eval epoch | COCO box [email protected] |
val/mAP_75 | Each eval epoch | COCO box [email protected] |
val/mAR | Each eval epoch | COCO mean average recall |
val/ema_mAP_50_95 | Each eval epoch | EMA-model mAP@[.50:.05:.95] (if EMA active) |
val/F1 | Each eval epoch | Macro F1 at best confidence threshold |
val/precision | Each eval epoch | Precision at best F1 threshold |
val/recall | Each eval epoch | Recall at best F1 threshold |
val/AP/<class> | Each eval epoch | Per-class AP (if log_per_class_metrics=True) |
val/segm_mAP_50_95 | Each eval epoch | Segmentation mAP (segmentation models only) |
val/segm_mAP_50 | Each eval epoch | Segmentation [email protected] (segmentation models only) |
test/* | After test run | Mirror of val/* keys |