Object Detection

RF-DETR is a real-time transformer architecture for object detection, built on a DINOv2 vision transformer backbone. The pretrained models are trained on the Microsoft COCO dataset and achieve state-of-the-art accuracy and latency trade-offs.

Model sizes

RF-DETR offers detection model sizes from Nano to 2XLarge. Choose a size based on your latency and accuracy requirements. To switch sizes, replace the class name or inference alias in your code.

Size	Python class	Inference alias	COCO AP₅₀	COCO AP_50:95	Latency (ms)	Params (M)	Resolution	License
N	`RFDETRNano`	`rfdetr-nano`	67.6	48.4	2.3	30.5	384x384	Apache 2.0
S	`RFDETRSmall`	`rfdetr-small`	72.1	53.0	3.5	32.1	512x512	Apache 2.0
M	`RFDETRMedium`	`rfdetr-medium`	73.6	54.7	4.4	33.7	576x576	Apache 2.0
L	`RFDETRLarge`	`rfdetr-large`	75.1	56.5	6.8	33.9	704x704	Apache 2.0
XL △	`RFDETRXLarge`	`rfdetr-xlarge`	77.4	58.6	11.5	126.4	700x700	PML 1.0
2XL △	`RFDETR2XLarge`	`rfdetr-2xlarge`	78.5	60.1	17.2	126.9	880x880	PML 1.0

△ The XLarge and 2XLarge models require the rfdetr_plus extension. Install it with pip install rfdetr[plus]. These models are licensed under PML 1.0 and require a Roboflow account.

Run on an image

Single image
Video file
Webcam stream
RTSP stream

import supervision as sv
from rfdetr import RFDETRMedium
from rfdetr.assets.coco_classes import COCO_CLASSES

model = RFDETRMedium()

detections = model.predict("https://media.roboflow.com/dog.jpg", threshold=0.5)

labels = [f"{COCO_CLASSES[class_id]}" for class_id in detections.class_id]

annotated_image = sv.BoxAnnotator().annotate(detections.data["source_image"], detections)
annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels)

predict() returns a supervision.Detections object containing bounding box coordinates, confidence scores, and class IDs. Access the source image via detections.data["source_image"].

import supervision as sv
from rfdetr import RFDETRMedium
from rfdetr.assets.coco_classes import COCO_CLASSES

model = RFDETRMedium()


def callback(frame, index):
    detections = model.predict(frame[:, :, ::-1], threshold=0.5)

    labels = [
        f"{COCO_CLASSES[class_id]} {confidence:.2f}"
        for class_id, confidence in zip(detections.class_id, detections.confidence)
    ]

    annotated_frame = frame.copy()
    annotated_frame = sv.BoxAnnotator().annotate(annotated_frame, detections)
    annotated_frame = sv.LabelAnnotator().annotate(annotated_frame, detections, labels)
    return annotated_frame


sv.process_video(
    source_path="<SOURCE_VIDEO_PATH>",
    target_path="<TARGET_VIDEO_PATH>",
    callback=callback,
)

Replace <SOURCE_VIDEO_PATH> and <TARGET_VIDEO_PATH> with your input and output file paths.

import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from rfdetr.assets.coco_classes import COCO_CLASSES

model = RFDETRMedium()

WEBCAM_INDEX = 0  # Change to the desired webcam index (e.g., 1, 2, ...)
video_capture = cv2.VideoCapture(WEBCAM_INDEX)
if not video_capture.isOpened():
    raise RuntimeError(f"Failed to open webcam: {WEBCAM_INDEX}")

while True:
    success, frame_bgr = video_capture.read()
    if not success:
        break

    frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
    detections = model.predict(frame_rgb, threshold=0.5)

    labels = [COCO_CLASSES[class_id] for class_id in detections.class_id]

    annotated_frame = sv.BoxAnnotator().annotate(frame_bgr, detections)
    annotated_frame = sv.LabelAnnotator().annotate(annotated_frame, detections, labels)

    cv2.imshow("RF-DETR Webcam", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

video_capture.release()
cv2.destroyAllWindows()

WEBCAM_INDEX is usually 0 for the default camera. Press q to quit.

import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from rfdetr.assets.coco_classes import COCO_CLASSES

model = RFDETRMedium()

video_capture = cv2.VideoCapture("<RTSP_STREAM_URL>")
if not video_capture.isOpened():
    raise RuntimeError("Failed to open RTSP stream: <RTSP_STREAM_URL>")

while True:
    success, frame_bgr = video_capture.read()
    if not success:
        break

    frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
    detections = model.predict(frame_rgb, threshold=0.5)

    labels = [COCO_CLASSES[class_id] for class_id in detections.class_id]

    annotated_frame = sv.BoxAnnotator().annotate(frame_bgr, detections)
    annotated_frame = sv.LabelAnnotator().annotate(annotated_frame, detections, labels)

    cv2.imshow("RF-DETR RTSP", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

video_capture.release()
cv2.destroyAllWindows()

Replace <RTSP_STREAM_URL> with your stream URL (e.g., rtsp://user:[email protected]/stream).

Batch inference

Pass a list of images to predict() to process multiple images in a single forward pass. The method returns a list of supervision.Detections objects in the same order as the input.

import io
import requests
import supervision as sv
from PIL import Image
from rfdetr import RFDETRMedium
from rfdetr.assets.coco_classes import COCO_CLASSES

model = RFDETRMedium()

urls = [
    "https://media.roboflow.com/notebooks/examples/dog-2.jpeg",
    "https://media.roboflow.com/notebooks/examples/dog-3.jpeg",
]

images = [Image.open(io.BytesIO(requests.get(url).content)) for url in urls]

detections_list = model.predict(images, threshold=0.5)

for image, detections in zip(images, detections_list):
    labels = [
        f"{COCO_CLASSES[class_id]} {confidence:.2f}"
        for class_id, confidence in zip(detections.class_id, detections.confidence)
    ]

    annotated_image = image.copy()
    annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections)
    annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels)

    sv.plot_image(annotated_image)

Run with Roboflow Inference

You can also run RF-DETR using the Inference library. To switch model size, use the corresponding inference alias from the table above.

import requests
import supervision as sv
from PIL import Image
from inference import get_model

model = get_model("rfdetr-medium")

image = Image.open(requests.get("https://media.roboflow.com/dog.jpg", stream=True).raw)
predictions = model.infer(image, confidence=0.5)[0]
detections = sv.Detections.from_inference(predictions)

annotated_image = sv.BoxAnnotator().annotate(image, detections)
annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections)

Pretrained models

Full model comparison table with accuracy, latency, and parameter counts.

Instance segmentation

Run RF-DETR for pixel-level instance segmentation.

Train a model

Fine-tune RF-DETR on your own dataset.

Deploy to Roboflow

Deploy your model to the Roboflow platform.

Get Started

Run Models

Train Models

Deploy & Export

Object Detection

Model sizes

Run on an image

Batch inference

Run with Roboflow Inference

Pretrained models

Instance segmentation

Train a model

Deploy to Roboflow

Build docs developers (and LLMs) love

Get Started

Run Models

Train Models

Deploy & Export

​Model sizes

​Run on an image

​Batch inference

​Run with Roboflow Inference

Pretrained models

Instance segmentation

Train a model

Deploy to Roboflow

Build docs developers (and LLMs) love

Model sizes

Run on an image

Batch inference

Run with Roboflow Inference