Skip to main content

What are Serverless Functions?

Serverless functions in CVAT are lightweight, containerized AI models that enable automatic and semi-automatic annotation capabilities. These functions run as independent microservices using Nuclio, a high-performance serverless framework, and can be invoked on-demand to accelerate annotation workflows.

Function Types

CVAT supports four types of serverless functions:

Detectors

Automatically detect and annotate objects in images without user input. Detectors scan entire frames and return bounding boxes, polygons, or masks for detected objects. Use cases:
  • Bulk annotation of similar objects across multiple frames
  • Initial annotation pass before manual refinement
  • Quality control and validation

Interactors

Semi-automatic annotation tools that require user input (points, boxes) to segment specific objects. Interactors provide precise control over which objects to annotate. Use cases:
  • Interactive object segmentation
  • Fine-grained annotation of complex shapes
  • Annotation of objects that are difficult to detect automatically

Trackers

Automatically track objects across video frames after initial annotation. Trackers maintain object identity throughout the sequence. Use cases:
  • Video annotation workflows
  • Temporal object tracking
  • Reducing manual frame-by-frame annotation effort

ReID (Re-Identification)

Match and link objects across frames based on visual similarity. ReID functions help maintain consistent object identities in multi-object tracking scenarios. Use cases:
  • Multi-object tracking
  • Cross-camera tracking
  • Object re-identification after occlusion

Available Models

CVAT includes several pre-configured serverless functions:
ModelTypeFrameworkDescriptionOutput Type
Segment Anything (SAM)InteractorPyTorchMeta’s foundation model for interactive segmentationMask
YOLO v7DetectorONNXFast object detection for 80 COCO classesRectangle
Mask R-CNNDetectorOpenVINOInstance segmentation for 80 COCO classesMask
RetinaNet R101DetectorPyTorch (Detectron2)Object detection with feature pyramid networksRectangle
Faster R-CNNDetectorTensorFlowObject detection for COCO datasetRectangle
Face Detection 0205DetectorOpenVINOSpecialized face detection with attributesRectangle
Human Pose (HRNet)DetectorPyTorch (MMPose)Keypoint detection for human pose estimationPoints
IOGInteractorPyTorchInside-Outside Guidance for interactive segmentationPolygon
TransTTrackerPyTorchTransformer-based visual object trackerRectangle

Model Details

COCO Classes

Many models (YOLO v7, Mask R-CNN, RetinaNet, Faster R-CNN) support the 80 COCO object classes including:
  • People and animals (person, cat, dog, horse, etc.)
  • Vehicles (car, truck, bus, bicycle, motorcycle, etc.)
  • Indoor objects (chair, couch, tv, laptop, etc.)
  • Food items (pizza, sandwich, apple, etc.)
  • Accessories (backpack, handbag, umbrella, etc.)

Architecture

Serverless functions in CVAT follow this architecture:
┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│   CVAT UI   │────────▶│  CVAT Server │────────▶│   Nuclio    │
│             │◀────────│ Lambda Mgr   │◀────────│  Functions  │
└─────────────┘         └──────────────┘         └─────────────┘


                                                  ┌─────────────┐
                                                  │   AI Model  │
                                                  │  Container  │
                                                  └─────────────┘
  1. CVAT UI: User initiates annotation with a serverless function
  2. Lambda Manager: Routes requests and manages function invocations
  3. Nuclio: Serverless platform that hosts and scales functions
  4. AI Model Container: Isolated environment running the ML model

Performance Characteristics

CPU vs GPU

Most functions support both CPU and GPU execution:
  • CPU: Lower cost, suitable for batch processing and non-time-critical tasks
  • GPU: Faster inference, better for interactive tools and real-time annotation

Scaling

Nuclio automatically manages:
  • Function lifecycle and health checks
  • Request routing and load balancing
  • Resource allocation per function
  • Worker pool management (typically 2 workers per function)

Integration with CVAT

Serverless functions integrate with CVAT through:

Automatic Annotation

Batch processing mode for detectors:
  • Processes all frames in a task or job
  • Creates annotations automatically
  • Runs as a background job with progress tracking

Interactive Tools

Real-time mode for interactors:
  • Immediate feedback during annotation
  • Point-and-click or box-based interaction
  • Results displayed instantly in the UI

Tracking Mode

Video-specific workflows:
  • Initialize tracking with first-frame annotation
  • Automatically propagate to subsequent frames
  • Maintains object state across frames

Next Steps

Deploy Functions

Learn how to deploy serverless functions using Nuclio

Custom Models

Create your own serverless functions

Build docs developers (and LLMs) love