Skip to main content
CVAT provides built-in format conversion through the Datumaro framework. This page covers converting between formats, handling format limitations, and best practices for dataset transformation.

Overview

Format conversion is needed when:
  • Your ML framework requires a specific format
  • Source annotations are in a different format
  • Converting between annotation types (e.g., masks to polygons)
  • Merging datasets from different sources
  • Adapting to format-specific limitations

Using Datumaro for Conversion

Datumaro is CVAT’s built-in dataset framework that handles all format conversions.

Installation

pip install datumaro

Basic Conversion

Convert between any supported formats:
# Convert COCO to YOLO
datum convert -if coco -i /path/to/coco -f yolo -o /path/to/yolo

# Convert Pascal VOC to COCO
datum convert -if voc -i /path/to/voc -f coco -o /path/to/coco

# Convert YOLO to Pascal VOC
datum convert -if yolo -i /path/to/yolo -f voc -o /path/to/voc
Command structure:
  • -if: Input format
  • -i: Input directory
  • -f: Output format
  • -o: Output directory

Python API Conversion

Programmatic format conversion:
import datumaro as dm

# Load dataset in COCO format
dataset = dm.Dataset.import_from('/path/to/coco', 'coco')

# Export to YOLO format
dataset.export('/path/to/yolo', 'yolo', save_media=True)

# Export to Pascal VOC
dataset.export('/path/to/voc', 'voc')

Batch Conversion

Convert to multiple formats:
import datumaro as dm

# Load source dataset
dataset = dm.Dataset.import_from('/path/to/source', 'coco')

# Export to multiple formats
formats = ['yolo', 'voc', 'labelme', 'imagenet']

for fmt in formats:
    output_path = f'/path/to/output/{fmt}'
    dataset.export(output_path, fmt, save_media=True)
    print(f'Exported to {fmt} at {output_path}')

Common Conversion Scenarios

COCO to YOLO

Convert COCO object detection to YOLO format:
import datumaro as dm

# Load COCO dataset
dataset = dm.Dataset.import_from('coco_dataset/', 'coco')

# Export to YOLO
dataset.export('yolo_dataset/', 'yolo', save_media=True)
Considerations:
  • COCO polygons are converted to bounding boxes
  • Category IDs are remapped to YOLO class indices
  • Coordinate normalization is handled automatically

YOLO to COCO

Convert YOLO to COCO for modern frameworks:
import datumaro as dm

# Load YOLO dataset with image dimensions
dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo')

# Export to COCO format
dataset.export('coco_dataset/', 'coco_instances', save_media=True)
Considerations:
  • YOLO bounding boxes become COCO bbox annotations
  • Class names from obj.names are preserved
  • Image dimensions required for conversion

Pascal VOC to COCO

Convert VOC XML annotations to COCO JSON:
import datumaro as dm

# Load Pascal VOC dataset
dataset = dm.Dataset.import_from('voc_dataset/', 'voc')

# Export to COCO
dataset.export('coco_dataset/', 'coco_instances', save_media=True)

Masks to Polygons

Convert segmentation masks to polygon annotations:
import datumaro as dm

# Load dataset with masks
dataset = dm.Dataset.import_from('dataset_with_masks/', 'coco')

# Convert masks to polygons
dataset = dataset.transform('masks_to_polygons')

# Export with polygons
dataset.export('dataset_with_polygons/', 'coco_instances')

Polygons to Masks

Convert polygons to pixel-level masks:
import datumaro as dm

# Load dataset with polygons
dataset = dm.Dataset.import_from('dataset_with_polygons/', 'coco')

# Convert polygons to masks
dataset = dataset.transform('polygons_to_masks')

# Export with masks
dataset.export('dataset_with_masks/', 'coco_instances')

Bounding Boxes to Polygons

Convert boxes to polygon representations:
import datumaro as dm

# Load dataset with bounding boxes
dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo')

# Convert boxes to polygons
dataset = dataset.transform('boxes_to_polygons')

# Export
dataset.export('polygon_dataset/', 'coco_instances')

CVAT-Specific Conversions

When exporting from CVAT, certain conversions happen automatically:

Ellipses to Masks

CVAT ellipses are automatically converted to masks for formats that don’t support ellipses:
# In CVAT export code (automatic)
from cvat.apps.dataset_manager.formats.transformations import EllipsesToMasks

with GetCVATDataExtractor(instance_data, include_images=save_images) as extractor:
    dataset = StreamDataset.from_extractors(extractor, env=dm_env)
    dataset.transform(EllipsesToMasks)  # Automatic conversion
    dataset.export(temp_dir, format_name, save_media=save_images)
This happens automatically when exporting to:
  • COCO formats
  • YOLO Segmentation
  • Most segmentation formats

Track Keyframes

For video annotations, CVAT ensures track keyframes are set:
# Automatic track keyframe setting
from cvat.apps.dataset_manager.formats.transformations import SetKeyframeForEveryTrackShape

dataset = dataset.transform(SetKeyframeForEveryTrackShape)
This ensures tracking annotations work correctly in imported formats.

Dataset Transformations

Datumaro provides powerful transformations beyond format conversion:

Filtering

Filter dataset by various criteria:
import datumaro as dm

dataset = dm.Dataset.import_from('source/', 'coco')

# Filter by label
filtered = dataset.filter('/item/annotation[label="person"]')

# Filter by annotation count
filtered = dataset.filter('/item[annotation/label="car"]', 
                          filter_annotations=True)

# Export filtered dataset
filtered.export('filtered_dataset/', 'coco')

Sampling

Create dataset subsets:
import datumaro as dm

dataset = dm.Dataset.import_from('source/', 'coco')

# Random sampling
sampled = dataset.transform('random_sampler', count=100)

# Subset by percentage
test_split = dataset.transform('random_split', splits=[(0.8, 'train'), (0.2, 'test')])

# Export subsets
for subset_name in ['train', 'test']:
    subset = test_split.get_subset(subset_name)
    subset.export(f'{subset_name}_dataset/', 'coco')

Label Mapping

Rename or merge labels:
import datumaro as dm

dataset = dm.Dataset.import_from('source/', 'coco')

# Remap labels
mapping = {
    'car': 'vehicle',
    'truck': 'vehicle',
    'bicycle': 'vehicle'
}

remapped = dataset.transform('remap_labels', mapping=mapping)
remapped.export('remapped_dataset/', 'coco')

Image Resizing

Resize images and adjust annotations:
import datumaro as dm

dataset = dm.Dataset.import_from('source/', 'coco')

# Resize to fixed dimensions
resized = dataset.transform('resize', width=640, height=640)

# Resize with aspect ratio
resized = dataset.transform('resize', width=640, height=640, keep_aspect_ratio=True)

resized.export('resized_dataset/', 'coco')

Annotation Normalization

Normalize annotations for consistency:
import datumaro as dm

dataset = dm.Dataset.import_from('source/', 'coco')

# Remove duplicate annotations
normalized = dataset.transform('remove_duplicates')

# Merge overlapping annotations
normalized = normalized.transform('merge_instance_segments')

normalized.export('normalized_dataset/', 'coco')

Format-Specific Considerations

COCO Format

Limitations:
  • No native support for ellipses (converted to masks/polygons)
  • Rotated boxes require polygon representation
  • Attributes stored as custom fields
Best practices:
# Export COCO with proper settings
dataset.export('coco_output/', 'coco_instances',
               save_media=True,
               merge_images=False,  # Keep images separate
               crop_covered=False)  # Don't crop overlapping regions

YOLO Format

Limitations:
  • Only bounding boxes (classic YOLO) or polygons (Ultralytics)
  • No attribute support
  • Requires image dimensions for import
Best practices:
# Provide image dimensions for YOLO import
image_info = {
    'image1': (480, 640),  # (height, width)
    'image2': (1080, 1920)
}

dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo',
                                  image_info=image_info)

Pascal VOC

Limitations:
  • XML-based, less efficient for large datasets
  • Limited attribute support
  • Bounding boxes only (segmentation in separate format)
Best practices:
# Export with proper subset splits
dataset.export('voc_output/', 'voc',
               label_map='voc',  # Use VOC standard labels
               save_media=True,
               apply_colormap=True)  # For segmentation

ImageNet

Limitations:
  • Classification only (no bounding boxes)
  • Directory-based organization
  • No spatial annotations
Best practices:
# Convert detection dataset to classification
from datumaro.components.annotation import AnnotationType

# Filter to keep only image-level labels
def filter_to_classification(item):
    labels = [ann for ann in item.annotations 
              if ann.type == AnnotationType.label]
    return item.wrap(annotations=labels)

dataset = dataset.transform(filter_to_classification)
dataset.export('imagenet_output/', 'imagenet')

Handling Annotation Type Mismatches

When converting between formats with different annotation types:

Detection to Segmentation

Convert bounding boxes to masks:
import datumaro as dm
import numpy as np
from datumaro.components.annotation import Mask

def boxes_to_masks(item):
    new_annotations = []
    for ann in item.annotations:
        if ann.type == dm.AnnotationType.bbox:
            # Create mask from bbox
            x, y, w, h = map(int, [ann.x, ann.y, ann.w, ann.h])
            mask = np.zeros((item.media.height, item.media.width), dtype=np.uint8)
            mask[y:y+h, x:x+w] = 1
            
            new_annotations.append(Mask(
                image=mask,
                label=ann.label,
                attributes=ann.attributes
            ))
        else:
            new_annotations.append(ann)
    
    return item.wrap(annotations=new_annotations)

dataset = dataset.transform(boxes_to_masks)

Segmentation to Detection

Convert masks to bounding boxes:
import datumaro as dm

# Automatic conversion using Datumaro
dataset = dataset.transform('masks_to_boxes')

# Or manually
from datumaro.components.annotation import Bbox
import numpy as np

def masks_to_boxes(item):
    new_annotations = []
    for ann in item.annotations:
        if ann.type == dm.AnnotationType.mask:
            # Compute bounding box from mask
            indices = np.where(ann.image != 0)
            if len(indices[0]) > 0:
                y_min, y_max = indices[0].min(), indices[0].max()
                x_min, x_max = indices[1].min(), indices[1].max()
                
                new_annotations.append(Bbox(
                    x=x_min,
                    y=y_min,
                    w=x_max - x_min,
                    h=y_max - y_min,
                    label=ann.label,
                    attributes=ann.attributes
                ))
        else:
            new_annotations.append(ann)
    
    return item.wrap(annotations=new_annotations)

dataset = dataset.transform(masks_to_boxes)

Keypoints to Detection

Extract bounding boxes from keypoint annotations:
import datumaro as dm
from datumaro.components.annotation import Bbox

def keypoints_to_boxes(item):
    new_annotations = []
    for ann in item.annotations:
        if ann.type == dm.AnnotationType.points:
            # Compute bounding box from keypoints
            points = np.array(ann.points).reshape(-1, 2)
            x_min, y_min = points.min(axis=0)
            x_max, y_max = points.max(axis=0)
            
            # Add padding
            padding = 10
            new_annotations.append(Bbox(
                x=max(0, x_min - padding),
                y=max(0, y_min - padding),
                w=x_max - x_min + 2 * padding,
                h=y_max - y_min + 2 * padding,
                label=ann.label
            ))
        else:
            new_annotations.append(ann)
    
    return item.wrap(annotations=new_annotations)

dataset = dataset.transform(keypoints_to_boxes)

Validation and Quality Checks

Validate converted datasets:
import datumaro as dm

# Load converted dataset
dataset = dm.Dataset.import_from('converted_dataset/', 'coco')

# Validate format
from datumaro.components.validator import TaskType
from datumaro.plugins.validators import validate_annotations

# Validate for detection task
reports = validate_annotations(dataset, task_type=TaskType.detection)

# Print validation issues
for report in reports:
    print(f"Issue: {report['anomaly_type']}")
    print(f"Severity: {report['severity']}")
    print(f"Description: {report['description']}")

# Check dataset statistics
stats = dataset.get_patch().stats()
print(f"Total items: {stats['annotations']['count']}")
print(f"Labels: {stats['annotations']['labels']}")

Troubleshooting

Missing Annotations After Conversion

Problem: Some annotations disappeared after conversion. Solutions:
  1. Check if target format supports the annotation type
  2. Verify labels exist in target format
  3. Check for invalid coordinates or empty annotations
# Debug missing annotations
import datumaro as dm

source = dm.Dataset.import_from('source/', 'coco')
converted = dm.Dataset.import_from('converted/', 'yolo')

print(f"Source annotations: {len(list(source.annotations()))}")
print(f"Converted annotations: {len(list(converted.annotations()))}")

# Check annotation types
for item in source:
    for ann in item.annotations:
        print(f"Type: {ann.type}, Label: {ann.label}")

Coordinate Mismatches

Problem: Bounding boxes or polygons are misplaced after conversion. Solutions:
  1. Verify image dimensions are correct
  2. Check coordinate normalization (YOLO uses normalized coords)
  3. Ensure coordinate systems match (some formats use different origins)
# Verify coordinates
for item in dataset:
    print(f"Image: {item.id}, Size: {item.media.size}")
    for ann in item.annotations:
        if ann.type == dm.AnnotationType.bbox:
            print(f"  Bbox: x={ann.x}, y={ann.y}, w={ann.w}, h={ann.h}")

Label Mapping Errors

Problem: Labels are incorrectly mapped or missing. Solutions:
  1. Provide explicit label mapping
  2. Check for case sensitivity in label names
  3. Verify label IDs match between formats
# Explicit label mapping
label_map = {
    0: 'background',
    1: 'person',
    2: 'car',
    3: 'bicycle'
}

dataset.export('output/', 'yolo', label_map=label_map)

Best Practices

  1. Always validate after conversion - Check statistics and sample images
  2. Preserve original datasets - Keep source data before conversion
  3. Use Datumaro format for intermediate storage - It preserves all information
  4. Test with small samples first - Verify conversion works before processing large datasets
  5. Document label mappings - Keep track of label changes between formats
  6. Handle edge cases - Empty annotations, overlapping regions, etc.
  7. Check format documentation - Understand target format limitations
  8. Use version control - Track dataset versions and conversions

Next Steps

Build docs developers (and LLMs) love