Creating Custom Serverless Functions

Overview

CVAT’s serverless architecture allows you to integrate custom AI models by packaging them as Nuclio functions. This guide covers creating functions for detectors, interactors, trackers, and ReID models.

Function Structure

Each serverless function consists of:

my-custom-function/
├── nuclio/
│   ├── function.yaml       # Nuclio function configuration
│   ├── main.py            # Handler and request processing
│   ├── model_handler.py   # Model loading and inference
│   └── requirements.txt   # Python dependencies (optional)
└── README.md              # Documentation (optional)

Creating a Detector Function

Step 1: Define Function Metadata

Create function.yaml with function configuration:

metadata:
  name: my-custom-detector
  namespace: cvat
  annotations:
    name: My Custom Detector
    type: detector
    version: 1
    spec: |
      [
        { "id": 0, "name": "class1", "type": "rectangle" },
        { "id": 1, "name": "class2", "type": "polygon" },
        { "id": 2, "name": "class3", "type": "mask" }
      ]

spec:
  description: Custom object detector
  runtime: 'python:3.10'
  handler: main:handler
  eventTimeout: 30s
  
  build:
    image: cvat.custom.detector
    baseImage: ubuntu:22.04
    directives:
      preCopy:
        - kind: RUN
          value: apt-get update && apt-get install -y python3-pip
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: pip install torch torchvision opencv-python-headless pillow numpy

  triggers:
    myHttpTrigger:
      numWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3

Step 2: Implement Model Handler

Create model_handler.py to load and run your model:

import torch
import cv2
import numpy as np
from typing import List, Dict

class ModelHandler:
    def __init__(self):
        """Initialize and load the model."""
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model = self._load_model()
        self.model.eval()
        
    def _load_model(self):
        """Load your custom model."""
        # Load pre-trained weights
        model = YourCustomModel()
        model.load_state_dict(torch.load('model_weights.pth', map_location=self.device))
        model.to(self.device)
        return model
    
    def preprocess(self, image: np.ndarray) -> torch.Tensor:
        """Preprocess image for model input."""
        # Resize, normalize, convert to tensor
        image = cv2.resize(image, (640, 640))
        image = image.astype(np.float32) / 255.0
        image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0)
        return image.to(self.device)
    
    def infer(self, image: np.ndarray, threshold: float = 0.5) -> List[Dict]:
        """Run inference and return detections."""
        preprocessed = self.preprocess(image)
        
        with torch.no_grad():
            predictions = self.model(preprocessed)
        
        return self._postprocess(predictions, image.shape, threshold)
    
    def _postprocess(self, predictions, original_shape, threshold):
        """Convert model output to CVAT format."""
        detections = []
        
        for pred in predictions:
            if pred['score'] < threshold:
                continue
                
            detection = {
                'label': self._get_label_name(pred['class_id']),
                'confidence': float(pred['score']),
                'type': 'rectangle',  # or 'polygon', 'mask'
                'points': self._convert_bbox(pred['bbox'], original_shape)
            }
            detections.append(detection)
        
        return detections
    
    def _get_label_name(self, class_id: int) -> str:
        """Map class ID to label name."""
        labels = ['class1', 'class2', 'class3']
        return labels[class_id]
    
    def _convert_bbox(self, bbox, original_shape):
        """Convert bbox to CVAT format [xtl, ytl, xbr, ybr]."""
        # Scale coordinates to original image size
        h, w = original_shape[:2]
        x1, y1, x2, y2 = bbox
        return [x1 * w, y1 * h, x2 * w, y2 * h]

Step 3: Create Request Handler

Create main.py to handle HTTP requests:

import json
import base64
import cv2
import numpy as np
from model_handler import ModelHandler

def init_context(context):
    """Initialize function context and load model."""
    context.logger.info("Initializing model...")
    context.user_data.model = ModelHandler()
    context.logger.info("Model initialized successfully")

def handler(context, event):
    """Handle inference requests."""
    try:
        # Parse request
        data = event.body
        image_data = base64.b64decode(data['image'])
        threshold = data.get('threshold', 0.5)
        
        # Decode image
        nparr = np.frombuffer(image_data, np.uint8)
        image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
        
        # Run inference
        results = context.user_data.model.infer(image, threshold)
        
        # Return response
        return context.Response(
            body=json.dumps(results),
            headers={},
            content_type='application/json',
            status_code=200
        )
    
    except Exception as e:
        context.logger.error(f"Error processing request: {str(e)}")
        return context.Response(
            body=json.dumps({'error': str(e)}),
            headers={},
            content_type='application/json',
            status_code=500
        )

Creating an Interactor Function

Interactors receive user input (points, boxes) for guided segmentation:

Function Metadata

metadata:
  name: my-custom-interactor
  namespace: cvat
  annotations:
    name: My Custom Interactor
    type: interactor
    version: 1
    spec: |
      [
        { "name": "object", "type": "polygon" }
      ]
    min_pos_points: 1
    min_neg_points: 0
    startswith_box: false
    startswith_box_optional: true
    help_message: Click points inside the object to segment it

Handler Implementation

def handler(context, event):
    """Handle interactive segmentation requests."""
    data = event.body
    
    # Extract inputs
    image_data = base64.b64decode(data['image'])
    pos_points = np.array(data['pos_points'])  # [[x1, y1], [x2, y2], ...]
    neg_points = np.array(data['neg_points'])  # [[x1, y1], ...]
    obj_bbox = data.get('obj_bbox')  # Optional [xtl, ytl, xbr, ybr]
    
    # Decode image
    nparr = np.frombuffer(image_data, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    
    # Run interactive segmentation
    mask = context.user_data.model.segment(
        image, 
        pos_points, 
        neg_points, 
        obj_bbox
    )
    
    # Convert mask to polygon
    polygon = mask_to_polygon(mask)
    
    return context.Response(
        body=json.dumps([{
            'label': 'object',
            'type': 'polygon',
            'points': polygon.flatten().tolist()
        }]),
        headers={},
        content_type='application/json',
        status_code=200
    )

Creating a Tracker Function

Trackers maintain object state across frames:

Function Metadata

metadata:
  name: my-custom-tracker
  namespace: cvat
  annotations:
    name: My Custom Tracker
    type: tracker
    version: 1
    spec:
    supported_shape_types: rectangle,polygon

Handler Implementation

def handler(context, event):
    """Handle tracking requests."""
    data = event.body
    
    # Extract inputs
    image_data = base64.b64decode(data['image'])
    shapes = data['shapes']  # Initial shapes or None for continuation
    states = data['states']  # Tracking state or [] for initialization
    
    # Decode image
    nparr = np.frombuffer(image_data, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    
    if not states:
        # Initialize tracking
        new_states = []
        for shape in shapes:
            state = context.user_data.model.init_tracker(
                image,
                shape['points'],
                shape['type']
            )
            new_states.append(state)
    else:
        # Continue tracking
        new_states = []
        for state in states:
            updated_state = context.user_data.model.track(
                image,
                state
            )
            new_states.append(updated_state)
    
    # Extract updated shapes
    updated_shapes = [
        context.user_data.model.get_shape(state) 
        for state in new_states
    ]
    
    return context.Response(
        body=json.dumps({
            'shapes': updated_shapes,
            'states': new_states  # Opaque state to pass to next frame
        }),
        headers={},
        content_type='application/json',
        status_code=200
    )

Output Formats

Rectangle

{
    'label': 'car',
    'type': 'rectangle',
    'points': [x1, y1, x2, y2],  # [xtl, ytl, xbr, ybr]
    'confidence': 0.95,
    'rotation': 0.0  # Optional
}

Polygon

{
    'label': 'person',
    'type': 'polygon',
    'points': [x1, y1, x2, y2, x3, y3, ...],  # Flat list of coordinates
    'confidence': 0.88
}

Mask

{
    'label': 'dog',
    'type': 'mask',
    'points': [x1, y1, x2, y2, ..., xtl, ytl, xbr, ybr],  # RLE + bbox
    'confidence': 0.92
}

Skeleton (with Elements)

{
    'label': 'person',
    'type': 'skeleton',
    'points': [],
    'elements': [
        {'label': 'nose', 'type': 'points', 'points': [x, y]},
        {'label': 'left_eye', 'type': 'points', 'points': [x, y]},
        # ... more keypoints
    ]
}

Advanced Features

Attributes

Add attributes to detections:

{
    'label': 'car',
    'type': 'rectangle',
    'points': [100, 200, 300, 400],
    'attributes': [
        {'name': 'color', 'value': 'red'},
        {'name': 'occluded', 'value': 'false'}
    ]
}

Define attributes in function.yaml:

spec: |
  [
    {
      "name": "car",
      "type": "rectangle",
      "attributes": [
        {
          "name": "color",
          "input_type": "select",
          "values": ["red", "blue", "green", "white", "black"]
        },
        {
          "name": "occluded",
          "input_type": "checkbox",
          "values": ["true", "false"]
        }
      ]
    }
  ]

Group Annotations

Group related objects:

[
    {'label': 'person', 'points': [...], 'group_id': 0},
    {'label': 'bicycle', 'points': [...], 'group_id': 0},  # Same person
    {'label': 'person', 'points': [...], 'group_id': 1},
    {'label': 'car', 'points': [...], 'group_id': 1}  # Different person
]

Deployment

Build and Deploy

# Deploy function
nuctl deploy --project-name cvat \
  --path ./my-custom-function/nuclio \
  --file ./my-custom-function/nuclio/function.yaml \
  --platform local

# Verify deployment
nuctl get function my-custom-detector --platform local

Test Function

import requests
import base64
import json

# Read and encode image
with open('test_image.jpg', 'rb') as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Call function
response = requests.post(
    'http://localhost:8080',  # Function port
    json={'image': image_b64, 'threshold': 0.5}
)

print(json.dumps(response.json(), indent=2))

Best Practices

Performance Optimization

Lazy Loading: Load models in init_context, not in handler
Batch Processing: Process multiple requests efficiently
GPU Utilization: Use GPU when available for faster inference
Model Optimization: Use ONNX, TensorRT, or quantization
Caching: Cache preprocessed data when possible

Error Handling

def handler(context, event):
    try:
        # Validate input
        if 'image' not in event.body:
            raise ValueError("Missing 'image' field")
        
        # Process request
        results = process_image(event.body)
        
        return context.Response(
            body=json.dumps(results),
            status_code=200
        )
    
    except ValueError as e:
        context.logger.warning(f"Invalid input: {e}")
        return context.Response(
            body=json.dumps({'error': str(e)}),
            status_code=400
        )
    
    except Exception as e:
        context.logger.error(f"Unexpected error: {e}", exc_info=True)
        return context.Response(
            body=json.dumps({'error': 'Internal server error'}),
            status_code=500
        )

Logging

def handler(context, event):
    context.logger.info("Processing request")
    context.logger.debug(f"Threshold: {event.body.get('threshold')}")
    
    try:
        result = process(event.body)
        context.logger.info(f"Found {len(result)} objects")
        return context.Response(body=json.dumps(result), status_code=200)
    except Exception as e:
        context.logger.error(f"Error: {e}", exc_info=True)
        raise

Resource Management

# Limit function resources
platform:
  attributes:
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
        nvidia.com/gpu: "1"  # Request GPU

Testing

Unit Tests

import pytest
from model_handler import ModelHandler

def test_model_loading():
    handler = ModelHandler()
    assert handler.model is not None

def test_inference():
    handler = ModelHandler()
    image = np.random.rand(640, 640, 3).astype(np.uint8)
    results = handler.infer(image, threshold=0.5)
    assert isinstance(results, list)

def test_output_format():
    handler = ModelHandler()
    image = np.random.rand(640, 640, 3).astype(np.uint8)
    results = handler.infer(image)
    
    for detection in results:
        assert 'label' in detection
        assert 'type' in detection
        assert 'points' in detection
        assert detection['type'] in ['rectangle', 'polygon', 'mask']

Integration Tests

# Deploy function locally
nuctl deploy --project-name cvat --path ./nuclio --file ./nuclio/function.yaml --platform local

# Get function URL
FUNC_URL=$(nuctl get function my-custom-detector -o json | jq -r '.status.httpPort')

# Test with sample image
python test_function.py --url "http://localhost:$FUNC_URL" --image test.jpg

Troubleshooting

Common Issues

Model Not Loading:

Check model file path in container
Verify dependencies in build directives
Increase memory limits

Slow Inference:

Use GPU-optimized function variant
Optimize model (ONNX, quantization)
Adjust numWorkers in function.yaml

Invalid Output Format:

Validate against CVAT expected format
Check coordinate scaling
Test with small dataset first

Memory Errors:

Increase container memory limits
Reduce batch size
Optimize image preprocessing

Examples

Explore existing functions in the CVAT repository:

SAM Interactor: serverless/pytorch/facebookresearch/sam/
YOLO Detector: serverless/onnx/WongKinYiu/yolov7/
TransT Tracker: serverless/pytorch/dschoerk/transt/
Mask R-CNN: serverless/openvino/omz/public/mask_rcnn_inception_resnet_v2_atrous_coco/

Installation

Administration

Serverless Functions

​Overview

​Function Structure

​Creating a Detector Function

​Step 1: Define Function Metadata

​Step 2: Implement Model Handler

​Step 3: Create Request Handler

​Creating an Interactor Function

​Function Metadata

​Handler Implementation

​Creating a Tracker Function

​Function Metadata

​Handler Implementation

​Output Formats

​Rectangle

​Polygon

​Mask

​Skeleton (with Elements)

​Advanced Features

​Attributes

​Group Annotations

​Deployment

​Build and Deploy

​Test Function

​Best Practices

​Performance Optimization

​Error Handling

​Logging

​Resource Management

​Testing

​Unit Tests

​Integration Tests

​Troubleshooting

​Common Issues

​Examples

​Next Steps

Deployment Guide

Overview

Build docs developers (and LLMs) love

Overview

Function Structure

Creating a Detector Function

Step 1: Define Function Metadata

Step 2: Implement Model Handler

Step 3: Create Request Handler

Creating an Interactor Function

Function Metadata

Handler Implementation

Creating a Tracker Function

Function Metadata

Handler Implementation

Output Formats

Rectangle

Polygon

Mask

Skeleton (with Elements)

Advanced Features

Attributes

Group Annotations

Deployment

Build and Deploy

Test Function

Best Practices

Performance Optimization

Error Handling

Logging

Resource Management

Testing

Unit Tests

Integration Tests

Troubleshooting

Common Issues

Examples

Next Steps