Run RF-DETR object detection models on images, video files, and live streams
RF-DETR is a real-time transformer architecture for object detection, built on a DINOv2 vision transformer backbone. The pretrained models are trained on the Microsoft COCO dataset and achieve state-of-the-art accuracy and latency trade-offs.
RF-DETR offers detection model sizes from Nano to 2XLarge. Choose a size based on your latency and accuracy requirements. To switch sizes, replace the class name or inference alias in your code.
Size
Python class
Inference alias
COCO AP50
COCO AP50:95
Latency (ms)
Params (M)
Resolution
License
N
RFDETRNano
rfdetr-nano
67.6
48.4
2.3
30.5
384x384
Apache 2.0
S
RFDETRSmall
rfdetr-small
72.1
53.0
3.5
32.1
512x512
Apache 2.0
M
RFDETRMedium
rfdetr-medium
73.6
54.7
4.4
33.7
576x576
Apache 2.0
L
RFDETRLarge
rfdetr-large
75.1
56.5
6.8
33.9
704x704
Apache 2.0
XL △
RFDETRXLarge
rfdetr-xlarge
77.4
58.6
11.5
126.4
700x700
PML 1.0
2XL △
RFDETR2XLarge
rfdetr-2xlarge
78.5
60.1
17.2
126.9
880x880
PML 1.0
△ The XLarge and 2XLarge models require the rfdetr_plus extension. Install it with pip install rfdetr[plus]. These models are licensed under PML 1.0 and require a Roboflow account.
import supervision as svfrom rfdetr import RFDETRMediumfrom rfdetr.assets.coco_classes import COCO_CLASSESmodel = RFDETRMedium()detections = model.predict("https://media.roboflow.com/dog.jpg", threshold=0.5)labels = [f"{COCO_CLASSES[class_id]}" for class_id in detections.class_id]annotated_image = sv.BoxAnnotator().annotate(detections.data["source_image"], detections)annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels)
predict() returns a supervision.Detections object containing bounding box coordinates, confidence scores, and class IDs. Access the source image via detections.data["source_image"].
Pass a list of images to predict() to process multiple images in a single forward pass. The method returns a list of supervision.Detections objects in the same order as the input.
import ioimport requestsimport supervision as svfrom PIL import Imagefrom rfdetr import RFDETRMediumfrom rfdetr.assets.coco_classes import COCO_CLASSESmodel = RFDETRMedium()urls = [ "https://media.roboflow.com/notebooks/examples/dog-2.jpeg", "https://media.roboflow.com/notebooks/examples/dog-3.jpeg",]images = [Image.open(io.BytesIO(requests.get(url).content)) for url in urls]detections_list = model.predict(images, threshold=0.5)for image, detections in zip(images, detections_list): labels = [ f"{COCO_CLASSES[class_id]} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence) ] annotated_image = image.copy() annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections) annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels) sv.plot_image(annotated_image)