Auto-Annotation

CVAT provides powerful auto-annotation capabilities using state-of-the-art deep learning models. These tools can dramatically accelerate annotation workflows by automatically detecting, segmenting, and tracking objects.

Overview

CVAT supports multiple approaches to automatic annotation:

Interactive Tools

Click-based segmentation with SAM2 and other interactive models

Detector Models

Automatic object detection with YOLO, Detectron2, and Transformers

Tracking Models

Multi-frame tracking with SAM2 tracker and other temporal models

Custom Functions

Deploy your own models as auto-annotation functions

Interactive Segmentation

AI Tools (SAM2)

The AI Tools button in the controls sidebar provides access to interactive segmentation models. Using Interactive Segmentation:

Click the AI Tools button in the left sidebar
Select a label for the annotation
Choose the interaction mode:
- Positive points: Click inside the object you want to segment
- Negative points: Click outside to exclude regions
- Bounding box: Draw a box around the object
The model generates a mask or polygon in real-time
Adjust by adding more positive/negative points
Click Done to create the annotation
Press N to repeat with the same settings

For best results with SAM2, start with a single click in the center of the object. Add positive points in missed regions and negative points in incorrectly included areas.

Available Models:

SAM2 (Segment Anything Model 2) - Meta’s foundation model for promptable segmentation
IOG (Interactive Object Segmentation) - Efficient interactive segmentation
Custom deployed interactive models

OpenCV Tools

The OpenCV Tools provide classical computer vision algorithms: Intelligent Scissors

Click the OpenCV Tools button
Select “Intelligent Scissors”
Click along the object boundary
The tool automatically finds edges between clicks
Close the polygon to finish

GrabCut

Draw a rough rectangle around the object
The algorithm segments the foreground
Refine with additional markers

OpenCV tools work entirely in the browser and don’t require a server connection, but are less accurate than deep learning models.

Automatic Detection

Automatic detection runs models over entire frames or videos to detect all objects of specific classes.

Using Auto-Annotation

From the Task Page:

Navigate to your task
Click Actions → Automatic annotation
Select a model:
- YOLO models (v5, v8, v11, v12) - Fast, accurate object detection
- Detectron2 models - Research-grade detection and segmentation
- Transformers models - Hugging Face model hub integration
Configure detection parameters:
- Threshold: Minimum confidence score (0.0-1.0)
- Labels mapping: Map model classes to your task labels
Click Annotate to start
Monitor progress in the task details page

Auto-annotation can take considerable time for large videos or datasets. The task will be locked during processing.

YOLO Models

YOLO (You Only Look Once) models provide fast, accurate detection: Supported Tasks:

Object detection (bounding boxes)
Instance segmentation (polygons)
Pose estimation (skeletons)
Oriented object detection (rotated boxes)
Classification

Available YOLO Versions:

YOLOv5 - Lightweight, fast
YOLOv8 - Improved accuracy
YOLOv11 - Latest performance
YOLO12 - State-of-the-art results

Configuration:

# Using CVAT CLI
cvat-cli auto-annotate create \
  --function-file /path/to/yolo/func.py \
  -p model=str:yolo12n.pt \
  -p device=str:cuda \
  <task-id>

YOLO Model Selection Guide

Nano models (n): Fastest, lowest accuracy - for real-time or CPU inferenceSmall models (s): Balanced speed and accuracyMedium models (m): Good accuracy, moderate speedLarge models (l): High accuracy, slowerExtra-large models (x): Best accuracy, slowest - for maximum quality

Detectron2 Models

Facebook Research’s Detectron2 provides state-of-the-art detection and segmentation: Available Architectures:

Faster R-CNN - Two-stage detection
RetinaNet - Single-stage dense detection
Mask R-CNN - Instance segmentation
Cascade R-CNN - Multi-stage refinement

Pretrained Datasets:

COCO (80 object classes)
LVIS (1000+ classes)
Custom trained models

Transformers Models

Hugging Face Transformers integration provides access to thousands of models: Supported Tasks:

object-detection - Bounding box detection (DETR, YOLOS, etc.)
image-segmentation - Semantic and instance segmentation
image-classification - Frame-level classification tags

Configuration:

# Using a Hugging Face model
cvat-cli auto-annotate create \
  --function-file /path/to/transformers/func.py \
  -p model=str:facebook/detr-resnet-50 \
  -p task=str:object-detection \
  -p device=str:cuda \
  <task-id>

Browse models at huggingface.co/models and use the model ID with the Transformers function.

Automatic Tracking

SAM2 Tracker

SAM2 can track segmented objects across video frames: Workflow:

Annotate objects on the first frame (manually or with AI tools)
Run SAM2 tracker auto-annotation function
The model propagates masks/polygons across subsequent frames
Review and correct as needed

Configuration:

cvat-cli auto-annotate create \
  --function-file /path/to/sam2/func.py \
  -p model_id=str:facebook/sam2.1-hiera-large \
  -p device=str:cuda \
  <task-id>

Model Options:

facebook/sam2.1-hiera-tiny - Fastest, 38.9M parameters
facebook/sam2.1-hiera-small - Balanced, 46M parameters
facebook/sam2.1-hiera-base-plus - High quality, 80.8M parameters
facebook/sam2.1-hiera-large - Best quality, 224.4M parameters

SAM2 tracking is particularly effective for:

Objects with complex boundaries
Partially occluded objects
Non-rigid deformations
Variable camera motion

TransT Tracker

TransT provides transformer-based object tracking:

Draw initial bounding box on first frame
Run TransT tracker
The model tracks the object through the video
Outputs interpolated tracks

Advanced Configuration

Model Parameters

Most auto-annotation functions accept parameters: Common Parameters:

Parameter	Type	Description
`model`	string	Path or identifier for the model
`device`	string	PyTorch device: `cpu`, `cuda`, `cuda:0`, etc.
`threshold`	float	Confidence threshold (0.0-1.0)
`labels_mapping`	dict	Map model classes to CVAT labels

YOLO-Specific:

Parameter	Type	Description
`keypoint_names_path`	string	Path to keypoint names file (pose models)
`imgsz`	int	Input image size (default: 640)
`conf`	float	Object confidence threshold
`iou`	float	NMS IoU threshold

Transformers-Specific:

Parameter	Type	Description
`task`	string	Model task: `object-detection`, `image-segmentation`, etc.
`threshold`	float	Detection threshold
`top_k`	int	Maximum detections per image

Label Mapping

When model classes don’t match your task labels:

-p labels_mapping=dict:'{"car": "vehicle", "truck": "vehicle", "person": "pedestrian"}'

Unmapped classes are ignored.

Filtering Results

By Confidence: Set a higher threshold to reduce false positives:

-p threshold=float:0.7  # Only keep detections with >70% confidence

By Class: Use label mapping to filter specific classes:

-p labels_mapping=dict:'{"person": "person"}'  # Only detect persons

Post-Processing

After auto-annotation:

Review results: Navigate through frames to check annotations
Adjust boundaries: Refine automatically generated shapes
Remove false positives: Delete incorrect detections
Add missed objects: Manually annotate objects the model missed
Set attributes: Auto-annotation doesn’t set attributes, add them manually
Verify tracks: Check that objects maintain consistent IDs across frames

Efficient Review Workflow:

Use arrow keys to quickly navigate between frames
Press H to hide correct objects and focus on errors
Use Del to quickly remove false positives
Press N to quickly add missed objects with repeat drawing

Custom Models

Deploying Custom Functions

You can deploy your own models as CVAT auto-annotation functions: Requirements:

Python function implementing the auto-annotation interface
Model weights and dependencies
Deployment environment (local, cloud, or Nuclio)

Example Function Structure:

import cvat_sdk.auto_annotation as cvataa

def detect(
    context: cvataa.DetectionFunctionContext,
) -> list[cvataa.shape.Shape]:
    # Load model
    model = load_model()
    
    results = []
    for frame in context.frame_iterator():
        # Run inference
        detections = model(frame.data)
        
        # Convert to CVAT shapes
        for det in detections:
            results.append(
                cvataa.shape.Rectangle(
                    label=det.label,
                    points=[det.x1, det.y1, det.x2, det.y2],
                    frame=frame.index,
                )
            )
    
    return results

spec = cvataa.DetectionFunctionSpec(
    labels=[...],
    version="1.0",
)

See the Auto-annotation API reference for complete documentation.

Using Serverless Functions

Deploy functions to Nuclio serverless platform:

Package function:
```
nuclio deploy --path /path/to/function
```
Register with CVAT:
- Navigate to Models page
- Click “Create Model”
- Enter function URL and details
Use from UI:
- Functions appear in auto-annotation model list
- Configure and run like built-in models

Serverless deployment allows:

GPU acceleration
Concurrent processing
Shared models across users
No local compute requirements

Performance Tips

GPU Acceleration

Always use device=str:cuda for faster processing. A GPU can be 10-100x faster than CPU for deep learning models.

Batch Processing

Auto-annotate multiple tasks together to amortize model loading time.

Model Selection

Use smaller models (e.g., YOLO nano) for initial annotation
Use larger models for final pass or difficult cases
Consider speed vs. accuracy tradeoff

Threshold Tuning

Start with lower threshold (0.3-0.5) to catch all objects
Remove false positives manually
Higher threshold (0.7-0.9) for high-precision applications

Model Comparison

Model	Speed	Accuracy	Best For
YOLO12n	Very Fast	Good	Real-time, quick annotation
YOLO12l	Medium	Excellent	Balanced production use
SAM2 (interactive)	Medium	Excellent	Complex shapes, pixel accuracy
SAM2 (tracking)	Slow	Excellent	Video segmentation
Detectron2	Slow	Excellent	Research, maximum accuracy
Transformers	Varies	Varies	Specialized models

Troubleshooting

Auto-annotation failed

Check:

Model is properly deployed and accessible
Sufficient memory/VRAM available
Network connectivity to model server
Task labels match model output classes
Check server logs for detailed error messages

No detections produced

Threshold may be too high, try lowering to 0.3
Check label mapping configuration
Verify model is appropriate for your data (e.g., trained on similar objects)

Poor quality results

Use a larger/better model
Adjust threshold
Ensure input images are high quality
Consider fine-tuning model on your specific domain

Processing too slow

Enable GPU acceleration
Use a smaller model
Process shorter video segments
Check if other processes are using GPU

Next Steps

Manual Annotation

Refine auto-annotation results manually

Advanced Tools

Use propagation and interpolation

SDK Reference

Build custom auto-annotation functions

Serverless Models

Explore available models

Get Started

Annotation

Projects & Tasks

Dataset Management

Integrations

Account & Organization

Auto-Annotation

Overview

Interactive Tools

Detector Models

Tracking Models

Custom Functions

Interactive Segmentation

AI Tools (SAM2)

OpenCV Tools

Automatic Detection

Using Auto-Annotation

YOLO Models

Detectron2 Models

Transformers Models

Automatic Tracking

SAM2 Tracker

TransT Tracker

Advanced Configuration

Model Parameters

Label Mapping

Filtering Results

Post-Processing

Custom Models

Deploying Custom Functions

Using Serverless Functions

Performance Tips

Model Comparison

Troubleshooting

Next Steps

Manual Annotation

Advanced Tools

SDK Reference

Serverless Models

Build docs developers (and LLMs) love

Get Started

Annotation

Projects & Tasks

Dataset Management

Integrations

Account & Organization

​Overview

Interactive Tools

Detector Models

Tracking Models

Custom Functions

​Interactive Segmentation

​AI Tools (SAM2)

​OpenCV Tools

​Automatic Detection

​Using Auto-Annotation

​YOLO Models

​Detectron2 Models

​Transformers Models

​Automatic Tracking

​SAM2 Tracker

​TransT Tracker

​Advanced Configuration

​Model Parameters

​Label Mapping

​Filtering Results

​Post-Processing

​Custom Models

​Deploying Custom Functions

​Using Serverless Functions

​Performance Tips

​Model Comparison

​Troubleshooting

​Next Steps

Manual Annotation

Advanced Tools

SDK Reference

Serverless Models

Build docs developers (and LLMs) love

Overview

Interactive Segmentation

AI Tools (SAM2)

OpenCV Tools

Automatic Detection

Using Auto-Annotation

YOLO Models

Detectron2 Models

Transformers Models

Automatic Tracking

SAM2 Tracker

TransT Tracker

Advanced Configuration

Model Parameters

Label Mapping

Filtering Results

Post-Processing

Custom Models

Deploying Custom Functions

Using Serverless Functions

Performance Tips

Model Comparison

Troubleshooting

Next Steps