Serverless Functions Overview

What are Serverless Functions?

Serverless functions in CVAT are lightweight, containerized AI models that enable automatic and semi-automatic annotation capabilities. These functions run as independent microservices using Nuclio, a high-performance serverless framework, and can be invoked on-demand to accelerate annotation workflows.

Function Types

CVAT supports four types of serverless functions:

Detectors

Automatically detect and annotate objects in images without user input. Detectors scan entire frames and return bounding boxes, polygons, or masks for detected objects. Use cases:

Bulk annotation of similar objects across multiple frames
Initial annotation pass before manual refinement
Quality control and validation

Interactors

Semi-automatic annotation tools that require user input (points, boxes) to segment specific objects. Interactors provide precise control over which objects to annotate. Use cases:

Interactive object segmentation
Fine-grained annotation of complex shapes
Annotation of objects that are difficult to detect automatically

Trackers

Automatically track objects across video frames after initial annotation. Trackers maintain object identity throughout the sequence. Use cases:

Video annotation workflows
Temporal object tracking
Reducing manual frame-by-frame annotation effort

ReID (Re-Identification)

Match and link objects across frames based on visual similarity. ReID functions help maintain consistent object identities in multi-object tracking scenarios. Use cases:

Multi-object tracking
Cross-camera tracking
Object re-identification after occlusion

Available Models

CVAT includes several pre-configured serverless functions:

Model	Type	Framework	Description	Output Type
Segment Anything (SAM)	Interactor	PyTorch	Meta’s foundation model for interactive segmentation	Mask
YOLO v7	Detector	ONNX	Fast object detection for 80 COCO classes	Rectangle
Mask R-CNN	Detector	OpenVINO	Instance segmentation for 80 COCO classes	Mask
RetinaNet R101	Detector	PyTorch (Detectron2)	Object detection with feature pyramid networks	Rectangle
Faster R-CNN	Detector	TensorFlow	Object detection for COCO dataset	Rectangle
Face Detection 0205	Detector	OpenVINO	Specialized face detection with attributes	Rectangle
Human Pose (HRNet)	Detector	PyTorch (MMPose)	Keypoint detection for human pose estimation	Points
IOG	Interactor	PyTorch	Inside-Outside Guidance for interactive segmentation	Polygon
TransT	Tracker	PyTorch	Transformer-based visual object tracker	Rectangle

Model Details

COCO Classes

Many models (YOLO v7, Mask R-CNN, RetinaNet, Faster R-CNN) support the 80 COCO object classes including:

People and animals (person, cat, dog, horse, etc.)
Vehicles (car, truck, bus, bicycle, motorcycle, etc.)
Indoor objects (chair, couch, tv, laptop, etc.)
Food items (pizza, sandwich, apple, etc.)
Accessories (backpack, handbag, umbrella, etc.)

Architecture

Serverless functions in CVAT follow this architecture:

┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│   CVAT UI   │────────▶│  CVAT Server │────────▶│   Nuclio    │
│             │◀────────│ Lambda Mgr   │◀────────│  Functions  │
└─────────────┘         └──────────────┘         └─────────────┘
                                                         │
                                                         ▼
                                                  ┌─────────────┐
                                                  │   AI Model  │
                                                  │  Container  │
                                                  └─────────────┘

CVAT UI: User initiates annotation with a serverless function
Lambda Manager: Routes requests and manages function invocations
Nuclio: Serverless platform that hosts and scales functions
AI Model Container: Isolated environment running the ML model

Performance Characteristics

CPU vs GPU

Most functions support both CPU and GPU execution:

CPU: Lower cost, suitable for batch processing and non-time-critical tasks
GPU: Faster inference, better for interactive tools and real-time annotation

Scaling

Nuclio automatically manages:

Function lifecycle and health checks
Request routing and load balancing
Resource allocation per function
Worker pool management (typically 2 workers per function)

Integration with CVAT

Serverless functions integrate with CVAT through:

Automatic Annotation

Batch processing mode for detectors:

Processes all frames in a task or job
Creates annotations automatically
Runs as a background job with progress tracking

Interactive Tools

Real-time mode for interactors:

Immediate feedback during annotation
Point-and-click or box-based interaction
Results displayed instantly in the UI

Tracking Mode

Video-specific workflows:

Initialize tracking with first-frame annotation
Automatically propagate to subsequent frames
Maintains object state across frames

Next Steps

Deploy Functions

Learn how to deploy serverless functions using Nuclio

Custom Models

Create your own serverless functions

Installation

Administration

Serverless Functions

Serverless Functions Overview

What are Serverless Functions?

Function Types

Detectors

Interactors

Trackers

ReID (Re-Identification)

Available Models

Model Details

COCO Classes

Architecture

Performance Characteristics

CPU vs GPU

Scaling

Integration with CVAT

Automatic Annotation

Interactive Tools

Tracking Mode

Next Steps

Deploy Functions

Custom Models

Build docs developers (and LLMs) love

Installation

Administration

Serverless Functions

​What are Serverless Functions?

​Function Types

​Detectors

​Interactors

​Trackers

​ReID (Re-Identification)

​Available Models

​Model Details

​COCO Classes

​Architecture

​Performance Characteristics

​CPU vs GPU

​Scaling

​Integration with CVAT

​Automatic Annotation

​Interactive Tools

​Tracking Mode

​Next Steps

Deploy Functions

Custom Models

Build docs developers (and LLMs) love

What are Serverless Functions?

Function Types

Detectors

Interactors

Trackers

ReID (Re-Identification)

Available Models

Model Details

COCO Classes

Architecture

Performance Characteristics

CPU vs GPU

Scaling

Integration with CVAT

Automatic Annotation

Interactive Tools

Tracking Mode

Next Steps