Skip to main content
CVAT supports eight shape types and frame-level tags, providing comprehensive annotation capabilities for diverse computer vision tasks. Each annotation type is optimized for specific use cases and object characteristics.

Annotation Types Overview

Rectangle

Axis-aligned bounding boxes

Polygon

Arbitrary closed shapes

Polyline

Open line segments

Points

Individual point markers

Ellipse

Elliptical shapes

Cuboid

3D boxes (2D projection)

Skeleton

Articulated keypoint structures

Mask

Pixel-accurate segmentation

Tag

Frame-level labels

Rectangle (Bounding Box)

Description

Rectangles are axis-aligned bounding boxes defined by opposite corners. They’re the fastest annotation type and are widely used for object detection.

When to Use

  • Object detection tasks
  • When exact boundaries aren’t critical
  • Quick initial annotation
  • Objects with roughly rectangular extent
  • YOLO, Faster R-CNN, and similar detector training

Drawing Methods

Method 1: Classic (2 Points)
  1. Activate rectangle tool (Shift+B)
  2. Click and drag from one corner to the opposite corner
  3. Release to finish
Method 2: Extreme Points (4 Points)
  1. Select “By 4 Points” in the tool options
  2. Click the four extreme points:
    • Topmost point of the object
    • Bottommost point
    • Leftmost point
    • Rightmost point
  3. CVAT fits a bounding box automatically
Extreme points method is faster for objects at an angle or with irregular shapes, as you don’t need to estimate the bounding box corners.

Properties

  • Coordinates: (x1, y1, x2, y2) where (x1, y1) is top-left and (x2, y2) is bottom-right
  • Rotation: Can be rotated after creation (not axis-aligned anymore)
  • Attributes: Supports all attribute types
  • In tracks: Interpolates position and size

Best Practices

Tight Fit

Keep minimal margin around the object. The box should touch the object at the extreme points.

Include Full Object

Don’t cut off any part of the object, even if partially occluded (mark as occluded instead).

Consistency

Maintain consistent margin across all instances of the same class.

Overlap Handling

Use Z-order to indicate which object is in front when boxes overlap.

Polygon

Description

Polygons are closed shapes defined by a sequence of vertices. They provide precise object boundaries for irregular shapes.

When to Use

  • Instance segmentation tasks
  • Objects with irregular boundaries
  • Precise boundary annotation required
  • Semantic segmentation with discrete objects
  • Mask R-CNN and similar model training

Drawing

  1. Activate polygon tool (Shift+P)
  2. Click to place each vertex along the object boundary
  3. Place vertices at corners and direction changes
  4. Close the polygon:
    • Double-click the first vertex, or
    • Press N, or
    • Click the first point again
Vertex Placement Strategy: Place more vertices where the boundary curves sharply, fewer on straight edges. Aim for smooth, accurate boundaries without excessive points.

Editing

Add Vertex:
  1. Select the polygon
  2. Click on an edge where you want to insert a point
  3. A new vertex is created
Move Vertex:
  1. Select the polygon
  2. Drag any vertex to reposition it
Remove Vertex:
  1. Select the polygon
  2. Right-click a vertex → “Delete point”, or
  3. Select vertex and press Del
Context Menu:
  • Right-click edge → Add point
  • Right-click vertex → Delete point

Properties

  • Coordinates: Array of (x, y) vertices
  • Minimum points: 3 (triangle)
  • Attributes: Supports all attribute types
  • In tracks: Interpolates vertex positions (can morph shape)

Best Practices

  • Zoom in for pixel-accurate boundaries
  • Follow edges precisely, especially for training segmentation models
  • Close gaps carefully at occlusion boundaries
  • Optimize points: Use enough for accuracy, not excessive
  • Clockwise/counterclockwise: Maintain consistent direction for all objects

Polyline

Description

Polylines are open line segments (not closed). They represent linear features without interior regions.

When to Use

  • Lane markings
  • Trajectories and paths
  • Borders and boundaries
  • Linear infrastructure (roads, power lines)
  • Centerlines
  • Motion paths

Drawing

  1. Activate polyline tool (Shift+L)
  2. Click to place vertices along the line
  3. Finish the line:
    • Press N, or
    • Double-click the last point

Editing

Same as polygons:
  • Add/remove vertices
  • Drag vertices to reposition
  • Extend or shorten the line

Properties

  • Coordinates: Array of (x, y) vertices (ordered)
  • Minimum points: 2 (line segment)
  • Open shape: No interior, just the path
  • In tracks: Interpolates as a flexible path

Best Practices

  • Place vertices at direction changes and curves
  • For lanes, annotate the center line unless specified otherwise
  • Maintain consistent direction (e.g., always left-to-right)
  • Extend to frame boundaries if the feature continues beyond the visible area
Polylines are especially useful for autonomous driving datasets where lane detection is critical.

Points

Description

Points are individual point markers used for small objects, centers, or multiple instance counting.

When to Use

  • Very small objects (< 5 pixels)
  • Object centers or key points (single point per object)
  • Counting (e.g., cells, people in crowds)
  • Sparse annotation
  • Click-based detection

Drawing

  1. Activate points tool (Shift+.)
  2. Optionally set the number of points to place
  3. Click to place each point
  4. Press N after the last point (or auto-completes after specified count)

Multiple Points in One Annotation

Points annotations can contain multiple points:
  • Each click adds a point to the same annotation
  • Useful for objects with multiple markers
  • All points share the same label and attributes

Properties

  • Coordinates: Array of (x, y) positions
  • Minimum points: 1
  • In tracks: Interpolates point positions

Best Practices

Center Placement

When marking object centers, be consistent (geometric center, visual center, etc.)

Zoom In

Always zoom in for pixel-accurate point placement

Single vs Multiple

Use single-point annotations for discrete objects, multi-point for objects with multiple markers

Visibility

Points can be hard to see - increase point size in settings

Ellipse

Description

Ellipses are oval shapes defined by center and two radii. They provide a middle ground between rectangles and polygons.

When to Use

  • Circular or oval objects (balls, wheels, heads)
  • When rotation matters
  • Faster than polygons, more accurate than rectangles
  • Objects with elliptical cross-sections

Drawing

  1. Activate ellipse tool
  2. Click and drag to define center and initial size
  3. Move mouse to adjust shape and rotation
  4. Click to finish

Editing

  • Drag center: Move the ellipse
  • Drag edge handles: Resize radii
  • Rotation handles: Rotate the ellipse

Properties

  • Coordinates: Center (cx, cy), radii (rx, ry), and rotation angle
  • Shape: Always elliptical
  • In tracks: Interpolates position, size, and rotation

Best Practices

  • Fit tightly to the object boundary
  • Use for circular objects where rectangles waste space
  • Consider rotation for tilted objects
  • Alternative to polygons when approximation is acceptable

Cuboid

Description

Cuboids are 3D boxes rendered in 2D as wire frames. They represent objects with depth in monocular images.

When to Use

  • Vehicles (cars, trucks, buses)
  • Buildings and structures
  • Furniture
  • Packages and boxes
  • Any object where 3D extent matters
  • Monocular 3D object detection

Drawing Methods

Method 1: From Rectangle
  1. Activate cuboid tool
  2. Draw the front face as a rectangle
  3. Drag to extend the depth (rear face)
  4. Adjust perspectives and edges
Method 2: By 4 Points
  1. Select “By 4 Points” method
  2. Click four corners that define the visible faces
  3. CVAT constructs the 3D cuboid

Structure

A cuboid consists of:
  • Front face: 4 points
  • Rear face: 4 points
  • Edges: Connecting lines
Total: 8 vertices defining the 3D box.

Editing

  • Drag faces: Move front or rear face
  • Drag vertices: Adjust individual corners
  • Drag edges: Adjust perspective
  • Whole cuboid: Drag center to move entire object

Properties

  • Coordinates: 8 vertices (x, y) for each corner
  • Projection: Perspective projection of 3D box onto 2D image
  • Attributes: Supports dimensions (width, height, length)
  • In tracks: Interpolates all 8 vertices

Best Practices

Ensure the cuboid follows the image perspective. Rear face should be smaller if farther away.
Adjust vertices to match visible edges. Not all edges may be visible depending on viewpoint.
Use width/height/length attributes to record physical dimensions when known.
Maintain consistent front/rear face assignment (e.g., front = vehicle front).
Cuboids are particularly valuable for autonomous driving datasets where 3D understanding is critical for planning and control.

Skeleton

Description

Skeletons are articulated structures composed of keypoints (nodes) connected by edges. They represent objects with defined structural relationships.

When to Use

  • Human pose estimation
  • Animal pose
  • Articulated objects (robots, machinery)
  • Hand keypoints
  • Facial landmarks
  • Any structured keypoint annotation

Setup

Skeletons require label configuration with:
  1. Keypoints: List of named points (e.g., “nose”, “left_eye”, “right_shoulder”)
  2. Edges: Connections between keypoints (e.g., “left_shoulder” to “left_elbow”)
Example Configuration (COCO 17-keypoint pose):
{
  "name": "person",
  "type": "skeleton",
  "sublabels": [
    {"name": "nose"},
    {"name": "left_eye"},
    {"name": "right_eye"},
    {"name": "left_ear"},
    {"name": "right_ear"},
    {"name": "left_shoulder"},
    {"name": "right_shoulder"},
    // ... more keypoints
  ],
  "svg": "<skeleton edges definition>"
}

Drawing

  1. Activate skeleton tool
  2. Select the skeleton label
  3. Click to place each keypoint in order:
    • Follow the defined keypoint sequence
    • Place each point at the anatomical location
  4. CVAT automatically draws edges between connected keypoints
Zoom in when placing keypoints on small or distant people for accuracy. You can adjust keypoint positions after placement.

Editing

  • Move keypoints: Drag any keypoint to reposition
  • Visibility states: Mark keypoints as visible/occluded/not visible
  • Add attributes: Each keypoint can have visibility attributes
  • Edges: Automatically update when keypoints move

Properties

  • Structure: N keypoints with M edges
  • Keypoint states: Visible, occluded, outside frame
  • Coordinates: (x, y) for each keypoint
  • In tracks: Interpolates keypoint positions
  • Topology: Fixed edge structure defined in label

Common Skeleton Formats

COCO Keypoints (17 points):
  • Full body pose with major joints
  • Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles
MPII (16 points):
  • Similar to COCO, slight variations
Hand Keypoints (21 points):
  • Wrist + 4 fingers × 5 joints each
Face Keypoints (68+ points):
  • Facial landmarks for expression analysis

Best Practices

Consistent Order

Always place keypoints in the defined order for consistency

Occluded Points

Mark occluded keypoints appropriately, don’t skip them

Symmetric Placement

For symmetric objects (humans), maintain left/right consistency

Zoom for Accuracy

Especially important for distant or small skeletons

Mask

Description

Masks are pixel-accurate binary segmentation masks. They represent objects at the highest level of detail.

When to Use

  • Pixel-perfect segmentation required
  • Complex, irregular boundaries
  • Objects with holes or intricate details
  • Semantic segmentation
  • Medical imaging
  • High-precision applications

Drawing

Masks use a specialized brush toolbox:
  1. Activate mask tool
  2. Select a label
  3. The brush toolbox appears with tools:
Brush Tool (Shift+1):
  • Paint to add mask pixels
  • Adjustable size and shape (circle/square)
  • Click and drag to paint
Eraser Tool (Shift+2):
  • Remove mask pixels
  • Same size/shape controls as brush
Polygon Add (Shift+3):
  • Add a polygonal region to the mask
  • Click vertices, close polygon
  • Region is filled and added to mask
Polygon Remove (Shift+4):
  • Remove a polygonal region from the mask
  • Useful for quickly removing large areas

Brush Settings

  • Size: 1-100 pixels
  • Form: Circle or square
  • Shortcuts:
    • [ - Decrease brush size
    • ] - Increase brush size
Efficient Mask Workflow:
  1. Use Polygon Add to roughly fill the object
  2. Use Brush to refine edges
  3. Use Eraser to remove overpainting
  4. Use Polygon Remove to cut out holes

Editing

  1. Select the mask annotation
  2. The brush toolbox reappears
  3. Use brush/eraser to modify
  4. Click “Done” or press N to finish editing

Properties

  • Storage: Run-length encoded (RLE) for efficiency
  • Resolution: Pixel-level
  • Binary: Each pixel is either in or out of the mask
  • In tracks: Each keyframe stores a full mask, interpolation is not applied (masks appear/disappear)

Performance

Masks are computationally intensive:
  • Large masks: May slow down the canvas
  • Many masks: Consider using polygons if possible
  • Video masks: Require keyframes on each annotated frame

Best Practices

Rough out the shape with Polygon Add, then refine with brush. This is much faster than painting everything.
Use large brush for interiors, small brush for edges. Adjust frequently with [ and ].
Zoom to at least 200% when painting boundary pixels for accuracy.
Masks can be memory-intensive; save your work regularly to avoid loss.
Use SAM2 or other interactive tools to generate initial masks, then refine manually.
In video annotation, masks do not interpolate between keyframes like other shape types. You must create a mask on each frame where the object is visible.

Tag

Description

Tags are frame-level labels without spatial extent. They classify entire frames or indicate frame properties.

When to Use

  • Image classification
  • Scene classification
  • Frame attributes (weather, time of day)
  • Event detection (frame contains event X)
  • Multi-label frame classification
  • Metadata annotation

Creating Tags

  1. Click the Tag tool in the controls sidebar
  2. Select a tag label
  3. Click Tag to apply to current frame
  4. The tag appears in the objects sidebar
Multiple tags can be applied to the same frame. Tags are independent of shape annotations.

Properties

  • No spatial extent: Tags have no position or shape
  • Frame-specific: Each tag applies to one frame
  • Attributes: Tags support attributes
  • Label-based: Each tag has a label (tag type)

Tag Annotation Workspace

For rapid tag annotation, switch to the Tag Annotation Workspace:
  1. Select workspace → “Tag Annotation”
  2. Interface shows:
    • Large image/frame display
    • Tag buttons for quick application
    • Keyboard shortcuts for each tag
  3. Navigate frames quickly with arrow keys
  4. Apply tags with number keys or clicks

Best Practices

  • Use consistent criteria for tag application
  • Document tag definitions in project guidelines
  • Use attributes for tag variants (e.g., “weather” tag with “sunny”/“rainy” attribute)
  • Leverage keyboard shortcuts for rapid annotation

Object Types: Shape vs Track

All annotation types (except tags) can be created as either shapes or tracks:

Shape

  • Exists on a single frame only
  • Used for static images or one-time appearances
  • No temporal continuity
  • ObjectType: SHAPE

Track

  • Exists across multiple frames
  • Represents the same object over time
  • Supports interpolation between keyframes
  • Maintains a single client ID across frames
  • ObjectType: TRACK
You can convert shapes to tracks and vice versa using the object context menu.

Track Keyframes

In tracks:
  • Keyframes: Frames where you manually set the object’s position/shape
  • Interpolated frames: Frames between keyframes where CVAT automatically estimates position
  • Mark a frame as keyframe with K

Choosing the Right Annotation Type

TaskRecommended Type
Object DetectionRectangle
Instance SegmentationPolygon or Mask
Semantic SegmentationMask
Lane DetectionPolyline
Pose EstimationSkeleton
Crowd CountingPoints
Image ClassificationTag
3D Object DetectionCuboid

Export Formats

Different annotation types are supported by different export formats:
FormatRectanglePolygonPolylinePointsMaskSkeletonCuboidTag
COCO
YOLO
Pascal VOC
CVAT for images
CVAT for video
Cityscapes
KITTI
Check your target export format before starting annotation to ensure it supports your chosen annotation types.

Next Steps

Manual Annotation

Learn workflows for creating these annotations

Advanced Tools

Master interpolation and track editing

Auto-Annotation

Generate annotations automatically with AI

Export Datasets

Export your annotations in various formats

Build docs developers (and LLMs) love