What are Serverless Functions?
Serverless functions in CVAT are lightweight, containerized AI models that enable automatic and semi-automatic annotation capabilities. These functions run as independent microservices using Nuclio, a high-performance serverless framework, and can be invoked on-demand to accelerate annotation workflows.Function Types
CVAT supports four types of serverless functions:Detectors
Automatically detect and annotate objects in images without user input. Detectors scan entire frames and return bounding boxes, polygons, or masks for detected objects. Use cases:- Bulk annotation of similar objects across multiple frames
- Initial annotation pass before manual refinement
- Quality control and validation
Interactors
Semi-automatic annotation tools that require user input (points, boxes) to segment specific objects. Interactors provide precise control over which objects to annotate. Use cases:- Interactive object segmentation
- Fine-grained annotation of complex shapes
- Annotation of objects that are difficult to detect automatically
Trackers
Automatically track objects across video frames after initial annotation. Trackers maintain object identity throughout the sequence. Use cases:- Video annotation workflows
- Temporal object tracking
- Reducing manual frame-by-frame annotation effort
ReID (Re-Identification)
Match and link objects across frames based on visual similarity. ReID functions help maintain consistent object identities in multi-object tracking scenarios. Use cases:- Multi-object tracking
- Cross-camera tracking
- Object re-identification after occlusion
Available Models
CVAT includes several pre-configured serverless functions:| Model | Type | Framework | Description | Output Type |
|---|---|---|---|---|
| Segment Anything (SAM) | Interactor | PyTorch | Meta’s foundation model for interactive segmentation | Mask |
| YOLO v7 | Detector | ONNX | Fast object detection for 80 COCO classes | Rectangle |
| Mask R-CNN | Detector | OpenVINO | Instance segmentation for 80 COCO classes | Mask |
| RetinaNet R101 | Detector | PyTorch (Detectron2) | Object detection with feature pyramid networks | Rectangle |
| Faster R-CNN | Detector | TensorFlow | Object detection for COCO dataset | Rectangle |
| Face Detection 0205 | Detector | OpenVINO | Specialized face detection with attributes | Rectangle |
| Human Pose (HRNet) | Detector | PyTorch (MMPose) | Keypoint detection for human pose estimation | Points |
| IOG | Interactor | PyTorch | Inside-Outside Guidance for interactive segmentation | Polygon |
| TransT | Tracker | PyTorch | Transformer-based visual object tracker | Rectangle |
Model Details
COCO Classes
Many models (YOLO v7, Mask R-CNN, RetinaNet, Faster R-CNN) support the 80 COCO object classes including:- People and animals (person, cat, dog, horse, etc.)
- Vehicles (car, truck, bus, bicycle, motorcycle, etc.)
- Indoor objects (chair, couch, tv, laptop, etc.)
- Food items (pizza, sandwich, apple, etc.)
- Accessories (backpack, handbag, umbrella, etc.)
Architecture
Serverless functions in CVAT follow this architecture:- CVAT UI: User initiates annotation with a serverless function
- Lambda Manager: Routes requests and manages function invocations
- Nuclio: Serverless platform that hosts and scales functions
- AI Model Container: Isolated environment running the ML model
Performance Characteristics
CPU vs GPU
Most functions support both CPU and GPU execution:- CPU: Lower cost, suitable for batch processing and non-time-critical tasks
- GPU: Faster inference, better for interactive tools and real-time annotation
Scaling
Nuclio automatically manages:- Function lifecycle and health checks
- Request routing and load balancing
- Resource allocation per function
- Worker pool management (typically 2 workers per function)
Integration with CVAT
Serverless functions integrate with CVAT through:Automatic Annotation
Batch processing mode for detectors:- Processes all frames in a task or job
- Creates annotations automatically
- Runs as a background job with progress tracking
Interactive Tools
Real-time mode for interactors:- Immediate feedback during annotation
- Point-and-click or box-based interaction
- Results displayed instantly in the UI
Tracking Mode
Video-specific workflows:- Initialize tracking with first-frame annotation
- Automatically propagate to subsequent frames
- Maintains object state across frames
Next Steps
Deploy Functions
Learn how to deploy serverless functions using Nuclio
Custom Models
Create your own serverless functions