Overview
CVAT supports multiple approaches to automatic annotation:Interactive Tools
Click-based segmentation with SAM2 and other interactive models
Detector Models
Automatic object detection with YOLO, Detectron2, and Transformers
Tracking Models
Multi-frame tracking with SAM2 tracker and other temporal models
Custom Functions
Deploy your own models as auto-annotation functions
Interactive Segmentation
AI Tools (SAM2)
The AI Tools button in the controls sidebar provides access to interactive segmentation models. Using Interactive Segmentation:- Click the AI Tools button in the left sidebar
- Select a label for the annotation
- Choose the interaction mode:
- Positive points: Click inside the object you want to segment
- Negative points: Click outside to exclude regions
- Bounding box: Draw a box around the object
- The model generates a mask or polygon in real-time
- Adjust by adding more positive/negative points
- Click Done to create the annotation
- Press
Nto repeat with the same settings
- SAM2 (Segment Anything Model 2) - Meta’s foundation model for promptable segmentation
- IOG (Interactive Object Segmentation) - Efficient interactive segmentation
- Custom deployed interactive models
OpenCV Tools
The OpenCV Tools provide classical computer vision algorithms: Intelligent Scissors- Click the OpenCV Tools button
- Select “Intelligent Scissors”
- Click along the object boundary
- The tool automatically finds edges between clicks
- Close the polygon to finish
- Draw a rough rectangle around the object
- The algorithm segments the foreground
- Refine with additional markers
OpenCV tools work entirely in the browser and don’t require a server connection, but are less accurate than deep learning models.
Automatic Detection
Automatic detection runs models over entire frames or videos to detect all objects of specific classes.Using Auto-Annotation
From the Task Page:- Navigate to your task
- Click Actions → Automatic annotation
- Select a model:
- YOLO models (v5, v8, v11, v12) - Fast, accurate object detection
- Detectron2 models - Research-grade detection and segmentation
- Transformers models - Hugging Face model hub integration
- Configure detection parameters:
- Threshold: Minimum confidence score (0.0-1.0)
- Labels mapping: Map model classes to your task labels
- Click Annotate to start
- Monitor progress in the task details page
YOLO Models
YOLO (You Only Look Once) models provide fast, accurate detection: Supported Tasks:- Object detection (bounding boxes)
- Instance segmentation (polygons)
- Pose estimation (skeletons)
- Oriented object detection (rotated boxes)
- Classification
- YOLOv5 - Lightweight, fast
- YOLOv8 - Improved accuracy
- YOLOv11 - Latest performance
- YOLO12 - State-of-the-art results
YOLO Model Selection Guide
YOLO Model Selection Guide
Nano models (n): Fastest, lowest accuracy - for real-time or CPU inferenceSmall models (s): Balanced speed and accuracyMedium models (m): Good accuracy, moderate speedLarge models (l): High accuracy, slowerExtra-large models (x): Best accuracy, slowest - for maximum quality
Detectron2 Models
Facebook Research’s Detectron2 provides state-of-the-art detection and segmentation: Available Architectures:- Faster R-CNN - Two-stage detection
- RetinaNet - Single-stage dense detection
- Mask R-CNN - Instance segmentation
- Cascade R-CNN - Multi-stage refinement
- COCO (80 object classes)
- LVIS (1000+ classes)
- Custom trained models
Transformers Models
Hugging Face Transformers integration provides access to thousands of models: Supported Tasks:object-detection- Bounding box detection (DETR, YOLOS, etc.)image-segmentation- Semantic and instance segmentationimage-classification- Frame-level classification tags
Automatic Tracking
SAM2 Tracker
SAM2 can track segmented objects across video frames: Workflow:- Annotate objects on the first frame (manually or with AI tools)
- Run SAM2 tracker auto-annotation function
- The model propagates masks/polygons across subsequent frames
- Review and correct as needed
facebook/sam2.1-hiera-tiny- Fastest, 38.9M parametersfacebook/sam2.1-hiera-small- Balanced, 46M parametersfacebook/sam2.1-hiera-base-plus- High quality, 80.8M parametersfacebook/sam2.1-hiera-large- Best quality, 224.4M parameters
SAM2 tracking is particularly effective for:
- Objects with complex boundaries
- Partially occluded objects
- Non-rigid deformations
- Variable camera motion
TransT Tracker
TransT provides transformer-based object tracking:- Draw initial bounding box on first frame
- Run TransT tracker
- The model tracks the object through the video
- Outputs interpolated tracks
Advanced Configuration
Model Parameters
Most auto-annotation functions accept parameters: Common Parameters:| Parameter | Type | Description |
|---|---|---|
model | string | Path or identifier for the model |
device | string | PyTorch device: cpu, cuda, cuda:0, etc. |
threshold | float | Confidence threshold (0.0-1.0) |
labels_mapping | dict | Map model classes to CVAT labels |
| Parameter | Type | Description |
|---|---|---|
keypoint_names_path | string | Path to keypoint names file (pose models) |
imgsz | int | Input image size (default: 640) |
conf | float | Object confidence threshold |
iou | float | NMS IoU threshold |
| Parameter | Type | Description |
|---|---|---|
task | string | Model task: object-detection, image-segmentation, etc. |
threshold | float | Detection threshold |
top_k | int | Maximum detections per image |
Label Mapping
When model classes don’t match your task labels:Filtering Results
By Confidence: Set a higher threshold to reduce false positives:Post-Processing
After auto-annotation:- Review results: Navigate through frames to check annotations
- Adjust boundaries: Refine automatically generated shapes
- Remove false positives: Delete incorrect detections
- Add missed objects: Manually annotate objects the model missed
- Set attributes: Auto-annotation doesn’t set attributes, add them manually
- Verify tracks: Check that objects maintain consistent IDs across frames
Custom Models
Deploying Custom Functions
You can deploy your own models as CVAT auto-annotation functions: Requirements:- Python function implementing the auto-annotation interface
- Model weights and dependencies
- Deployment environment (local, cloud, or Nuclio)
Using Serverless Functions
Deploy functions to Nuclio serverless platform:-
Package function:
-
Register with CVAT:
- Navigate to Models page
- Click “Create Model”
- Enter function URL and details
-
Use from UI:
- Functions appear in auto-annotation model list
- Configure and run like built-in models
Serverless deployment allows:
- GPU acceleration
- Concurrent processing
- Shared models across users
- No local compute requirements
Performance Tips
GPU Acceleration
GPU Acceleration
Always use
device=str:cuda for faster processing. A GPU can be 10-100x faster than CPU for deep learning models.Batch Processing
Batch Processing
Auto-annotate multiple tasks together to amortize model loading time.
Model Selection
Model Selection
- Use smaller models (e.g., YOLO nano) for initial annotation
- Use larger models for final pass or difficult cases
- Consider speed vs. accuracy tradeoff
Threshold Tuning
Threshold Tuning
- Start with lower threshold (0.3-0.5) to catch all objects
- Remove false positives manually
- Higher threshold (0.7-0.9) for high-precision applications
Model Comparison
| Model | Speed | Accuracy | Best For |
|---|---|---|---|
| YOLO12n | Very Fast | Good | Real-time, quick annotation |
| YOLO12l | Medium | Excellent | Balanced production use |
| SAM2 (interactive) | Medium | Excellent | Complex shapes, pixel accuracy |
| SAM2 (tracking) | Slow | Excellent | Video segmentation |
| Detectron2 | Slow | Excellent | Research, maximum accuracy |
| Transformers | Varies | Varies | Specialized models |
Troubleshooting
Auto-annotation failed
Auto-annotation failed
Check:
- Model is properly deployed and accessible
- Sufficient memory/VRAM available
- Network connectivity to model server
- Task labels match model output classes
- Check server logs for detailed error messages
No detections produced
No detections produced
- Threshold may be too high, try lowering to 0.3
- Check label mapping configuration
- Verify model is appropriate for your data (e.g., trained on similar objects)
Poor quality results
Poor quality results
- Use a larger/better model
- Adjust threshold
- Ensure input images are high quality
- Consider fine-tuning model on your specific domain
Processing too slow
Processing too slow
- Enable GPU acceleration
- Use a smaller model
- Process shorter video segments
- Check if other processes are using GPU
Next Steps
Manual Annotation
Refine auto-annotation results manually
Advanced Tools
Use propagation and interpolation
SDK Reference
Build custom auto-annotation functions
Serverless Models
Explore available models