Shapes and Objects

CVAT supports eight shape types and frame-level tags, providing comprehensive annotation capabilities for diverse computer vision tasks. Each annotation type is optimized for specific use cases and object characteristics.

Annotation Types Overview

Rectangle

Axis-aligned bounding boxes

Polygon

Arbitrary closed shapes

Polyline

Open line segments

Points

Individual point markers

Ellipse

Elliptical shapes

Cuboid

3D boxes (2D projection)

Skeleton

Articulated keypoint structures

Mask

Pixel-accurate segmentation

Tag

Frame-level labels

Rectangle (Bounding Box)

Description

Rectangles are axis-aligned bounding boxes defined by opposite corners. They’re the fastest annotation type and are widely used for object detection.

When to Use

Object detection tasks
When exact boundaries aren’t critical
Quick initial annotation
Objects with roughly rectangular extent
YOLO, Faster R-CNN, and similar detector training

Drawing Methods

Method 1: Classic (2 Points)

Activate rectangle tool (Shift+B)
Click and drag from one corner to the opposite corner
Release to finish

Method 2: Extreme Points (4 Points)

Select “By 4 Points” in the tool options
Click the four extreme points:
- Topmost point of the object
- Bottommost point
- Leftmost point
- Rightmost point
CVAT fits a bounding box automatically

Extreme points method is faster for objects at an angle or with irregular shapes, as you don’t need to estimate the bounding box corners.

Properties

Coordinates: (x1, y1, x2, y2) where (x1, y1) is top-left and (x2, y2) is bottom-right
Rotation: Can be rotated after creation (not axis-aligned anymore)
Attributes: Supports all attribute types
In tracks: Interpolates position and size

Best Practices

Tight Fit

Keep minimal margin around the object. The box should touch the object at the extreme points.

Include Full Object

Don’t cut off any part of the object, even if partially occluded (mark as occluded instead).

Consistency

Maintain consistent margin across all instances of the same class.

Overlap Handling

Use Z-order to indicate which object is in front when boxes overlap.

Polygon

Description

Polygons are closed shapes defined by a sequence of vertices. They provide precise object boundaries for irregular shapes.

When to Use

Instance segmentation tasks
Objects with irregular boundaries
Precise boundary annotation required
Semantic segmentation with discrete objects
Mask R-CNN and similar model training

Drawing

Activate polygon tool (Shift+P)
Click to place each vertex along the object boundary
Place vertices at corners and direction changes
Close the polygon:
- Double-click the first vertex, or
- Press N, or
- Click the first point again

Vertex Placement Strategy: Place more vertices where the boundary curves sharply, fewer on straight edges. Aim for smooth, accurate boundaries without excessive points.

Editing

Add Vertex:

Select the polygon
Click on an edge where you want to insert a point
A new vertex is created

Move Vertex:

Select the polygon
Drag any vertex to reposition it

Remove Vertex:

Select the polygon
Right-click a vertex → “Delete point”, or
Select vertex and press Del

Context Menu:

Right-click edge → Add point
Right-click vertex → Delete point

Properties

Coordinates: Array of (x, y) vertices
Minimum points: 3 (triangle)
Attributes: Supports all attribute types
In tracks: Interpolates vertex positions (can morph shape)

Best Practices

Zoom in for pixel-accurate boundaries
Follow edges precisely, especially for training segmentation models
Close gaps carefully at occlusion boundaries
Optimize points: Use enough for accuracy, not excessive
Clockwise/counterclockwise: Maintain consistent direction for all objects

Polyline

Description

Polylines are open line segments (not closed). They represent linear features without interior regions.

When to Use

Lane markings
Trajectories and paths
Borders and boundaries
Linear infrastructure (roads, power lines)
Centerlines
Motion paths

Drawing

Activate polyline tool (Shift+L)
Click to place vertices along the line
Finish the line:
- Press N, or
- Double-click the last point

Editing

Same as polygons:

Add/remove vertices
Drag vertices to reposition
Extend or shorten the line

Properties

Coordinates: Array of (x, y) vertices (ordered)
Minimum points: 2 (line segment)
Open shape: No interior, just the path
In tracks: Interpolates as a flexible path

Best Practices

Place vertices at direction changes and curves
For lanes, annotate the center line unless specified otherwise
Maintain consistent direction (e.g., always left-to-right)
Extend to frame boundaries if the feature continues beyond the visible area

Polylines are especially useful for autonomous driving datasets where lane detection is critical.

Points

Description

Points are individual point markers used for small objects, centers, or multiple instance counting.

When to Use

Very small objects (< 5 pixels)
Object centers or key points (single point per object)
Counting (e.g., cells, people in crowds)
Sparse annotation
Click-based detection

Drawing

Activate points tool (Shift+.)
Optionally set the number of points to place
Click to place each point
Press N after the last point (or auto-completes after specified count)

Multiple Points in One Annotation

Points annotations can contain multiple points:

Each click adds a point to the same annotation
Useful for objects with multiple markers
All points share the same label and attributes

Properties

Coordinates: Array of (x, y) positions
Minimum points: 1
In tracks: Interpolates point positions

Best Practices

Center Placement

When marking object centers, be consistent (geometric center, visual center, etc.)

Zoom In

Always zoom in for pixel-accurate point placement

Single vs Multiple

Use single-point annotations for discrete objects, multi-point for objects with multiple markers

Visibility

Points can be hard to see - increase point size in settings

Ellipse

Description

Ellipses are oval shapes defined by center and two radii. They provide a middle ground between rectangles and polygons.

When to Use

Circular or oval objects (balls, wheels, heads)
When rotation matters
Faster than polygons, more accurate than rectangles
Objects with elliptical cross-sections

Drawing

Activate ellipse tool
Click and drag to define center and initial size
Move mouse to adjust shape and rotation
Click to finish

Editing

Drag center: Move the ellipse
Drag edge handles: Resize radii
Rotation handles: Rotate the ellipse

Properties

Coordinates: Center (cx, cy), radii (rx, ry), and rotation angle
Shape: Always elliptical
In tracks: Interpolates position, size, and rotation

Best Practices

Fit tightly to the object boundary
Use for circular objects where rectangles waste space
Consider rotation for tilted objects
Alternative to polygons when approximation is acceptable

Cuboid

Description

Cuboids are 3D boxes rendered in 2D as wire frames. They represent objects with depth in monocular images.

When to Use

Vehicles (cars, trucks, buses)
Buildings and structures
Furniture
Packages and boxes
Any object where 3D extent matters
Monocular 3D object detection

Drawing Methods

Method 1: From Rectangle

Activate cuboid tool
Draw the front face as a rectangle
Drag to extend the depth (rear face)
Adjust perspectives and edges

Method 2: By 4 Points

Select “By 4 Points” method
Click four corners that define the visible faces
CVAT constructs the 3D cuboid

Structure

A cuboid consists of:

Front face: 4 points
Rear face: 4 points
Edges: Connecting lines

Total: 8 vertices defining the 3D box.

Editing

Drag faces: Move front or rear face
Drag vertices: Adjust individual corners
Drag edges: Adjust perspective
Whole cuboid: Drag center to move entire object

Properties

Coordinates: 8 vertices (x, y) for each corner
Projection: Perspective projection of 3D box onto 2D image
Attributes: Supports dimensions (width, height, length)
In tracks: Interpolates all 8 vertices

Best Practices

Perspective Consistency

Ensure the cuboid follows the image perspective. Rear face should be smaller if farther away.

Visible Faces

Adjust vertices to match visible edges. Not all edges may be visible depending on viewpoint.

Dimension Attributes

Use width/height/length attributes to record physical dimensions when known.

Orientation

Maintain consistent front/rear face assignment (e.g., front = vehicle front).

Cuboids are particularly valuable for autonomous driving datasets where 3D understanding is critical for planning and control.

Skeleton

Description

Skeletons are articulated structures composed of keypoints (nodes) connected by edges. They represent objects with defined structural relationships.

When to Use

Human pose estimation
Animal pose
Articulated objects (robots, machinery)
Hand keypoints
Facial landmarks
Any structured keypoint annotation

Setup

Skeletons require label configuration with:

Keypoints: List of named points (e.g., “nose”, “left_eye”, “right_shoulder”)
Edges: Connections between keypoints (e.g., “left_shoulder” to “left_elbow”)

Example Configuration (COCO 17-keypoint pose):

{
  "name": "person",
  "type": "skeleton",
  "sublabels": [
    {"name": "nose"},
    {"name": "left_eye"},
    {"name": "right_eye"},
    {"name": "left_ear"},
    {"name": "right_ear"},
    {"name": "left_shoulder"},
    {"name": "right_shoulder"},
    // ... more keypoints
  ],
  "svg": "<skeleton edges definition>"
}

Drawing

Activate skeleton tool
Select the skeleton label
Click to place each keypoint in order:
- Follow the defined keypoint sequence
- Place each point at the anatomical location
CVAT automatically draws edges between connected keypoints

Zoom in when placing keypoints on small or distant people for accuracy. You can adjust keypoint positions after placement.

Editing

Move keypoints: Drag any keypoint to reposition
Visibility states: Mark keypoints as visible/occluded/not visible
Add attributes: Each keypoint can have visibility attributes
Edges: Automatically update when keypoints move

Properties

Structure: N keypoints with M edges
Keypoint states: Visible, occluded, outside frame
Coordinates: (x, y) for each keypoint
In tracks: Interpolates keypoint positions
Topology: Fixed edge structure defined in label

Common Skeleton Formats

COCO Keypoints (17 points):

Full body pose with major joints
Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles

MPII (16 points):

Similar to COCO, slight variations

Hand Keypoints (21 points):

Wrist + 4 fingers × 5 joints each

Face Keypoints (68+ points):

Facial landmarks for expression analysis

Best Practices

Consistent Order

Always place keypoints in the defined order for consistency

Occluded Points

Mark occluded keypoints appropriately, don’t skip them

Symmetric Placement

For symmetric objects (humans), maintain left/right consistency

Zoom for Accuracy

Especially important for distant or small skeletons

Mask

Description

Masks are pixel-accurate binary segmentation masks. They represent objects at the highest level of detail.

When to Use

Pixel-perfect segmentation required
Complex, irregular boundaries
Objects with holes or intricate details
Semantic segmentation
Medical imaging
High-precision applications

Drawing

Masks use a specialized brush toolbox:

Activate mask tool
Select a label
The brush toolbox appears with tools:

Brush Tool (Shift+1):

Paint to add mask pixels
Adjustable size and shape (circle/square)
Click and drag to paint

Eraser Tool (Shift+2):

Remove mask pixels
Same size/shape controls as brush

Polygon Add (Shift+3):

Add a polygonal region to the mask
Click vertices, close polygon
Region is filled and added to mask

Polygon Remove (Shift+4):

Remove a polygonal region from the mask
Useful for quickly removing large areas

Brush Settings

Size: 1-100 pixels
Form: Circle or square
Shortcuts:
- [ - Decrease brush size
- ] - Increase brush size

Efficient Mask Workflow:

Use Polygon Add to roughly fill the object
Use Brush to refine edges
Use Eraser to remove overpainting
Use Polygon Remove to cut out holes

Editing

Select the mask annotation
The brush toolbox reappears
Use brush/eraser to modify
Click “Done” or press N to finish editing

Properties

Storage: Run-length encoded (RLE) for efficiency
Resolution: Pixel-level
Binary: Each pixel is either in or out of the mask
In tracks: Each keyframe stores a full mask, interpolation is not applied (masks appear/disappear)

Performance

Masks are computationally intensive:

Large masks: May slow down the canvas
Many masks: Consider using polygons if possible
Video masks: Require keyframes on each annotated frame

Best Practices

Use Polygon Tools First

Rough out the shape with Polygon Add, then refine with brush. This is much faster than painting everything.

Appropriate Brush Size

Use large brush for interiors, small brush for edges. Adjust frequently with [ and ].

Zoom In for Edges

Zoom to at least 200% when painting boundary pixels for accuracy.

Save Frequently

Masks can be memory-intensive; save your work regularly to avoid loss.

Consider Auto-Annotation

Use SAM2 or other interactive tools to generate initial masks, then refine manually.

In video annotation, masks do not interpolate between keyframes like other shape types. You must create a mask on each frame where the object is visible.

Object Types: Shape vs Track

All annotation types (except tags) can be created as either shapes or tracks:

Shape

Exists on a single frame only
Used for static images or one-time appearances
No temporal continuity
ObjectType: SHAPE

Track

Exists across multiple frames
Represents the same object over time
Supports interpolation between keyframes
Maintains a single client ID across frames
ObjectType: TRACK

You can convert shapes to tracks and vice versa using the object context menu.

Track Keyframes

In tracks:

Keyframes: Frames where you manually set the object’s position/shape
Interpolated frames: Frames between keyframes where CVAT automatically estimates position
Mark a frame as keyframe with K

Choosing the Right Annotation Type

By Task
By Accuracy Needed
By Speed

Task	Recommended Type
Object Detection	Rectangle
Instance Segmentation	Polygon or Mask
Semantic Segmentation	Mask
Lane Detection	Polyline
Pose Estimation	Skeleton
Crowd Counting	Points
Image Classification	Tag
3D Object Detection	Cuboid

Export Formats

Different annotation types are supported by different export formats:

Format	Rectangle	Polygon	Polyline	Points	Mask	Skeleton	Cuboid	Tag
COCO	✓	✓	✗	✓	✓	✓	✗	✗
YOLO	✓	✗	✗	✗	✗	✗	✗	✗
Pascal VOC	✓	✗	✗	✗	✗	✗	✗	✗
CVAT for images	✓	✓	✓	✓	✓	✓	✓	✓
CVAT for video	✓	✓	✓	✓	✓	✓	✓	✓
Cityscapes	✓	✓	✗	✗	✗	✗	✗	✗
KITTI	✓	✗	✗	✗	✗	✗	✓	✗

Check your target export format before starting annotation to ensure it supports your chosen annotation types.

Next Steps

Manual Annotation

Learn workflows for creating these annotations

Advanced Tools

Master interpolation and track editing

Auto-Annotation

Generate annotations automatically with AI

Export Datasets

Export your annotations in various formats

Get Started

Annotation

Projects & Tasks

Dataset Management

Integrations

Account & Organization

​Annotation Types Overview

Rectangle

Polygon

Polyline

Points

Ellipse

Cuboid

Skeleton

Mask

Tag

​Rectangle (Bounding Box)

​Description

​When to Use

​Drawing Methods

​Properties

​Best Practices

Tight Fit

Include Full Object

Consistency

Overlap Handling

​Polygon

​Description

​When to Use

​Drawing

​Editing

​Properties

​Best Practices

​Polyline

​Description

​When to Use

​Drawing

​Editing

​Properties

​Best Practices

​Points

​Description

​When to Use

​Drawing

​Multiple Points in One Annotation

​Properties

​Best Practices

Center Placement

Zoom In

Single vs Multiple

Visibility

​Ellipse

​Description

​When to Use

​Drawing

​Editing

​Properties

​Best Practices

​Cuboid

​Description

​When to Use

​Drawing Methods

​Structure

​Editing

​Properties

​Best Practices

​Skeleton

​Description

​When to Use

​Setup

​Drawing

​Editing

​Properties

​Common Skeleton Formats

​Best Practices

Consistent Order

Occluded Points

Symmetric Placement

Zoom for Accuracy

​Mask

Annotation Types Overview

Rectangle (Bounding Box)

Description

When to Use

Drawing Methods

Properties

Best Practices

Polygon

Description

When to Use

Drawing

Editing

Properties

Best Practices

Polyline

Description

When to Use

Drawing

Editing

Properties

Best Practices

Points

Description

When to Use

Drawing

Multiple Points in One Annotation

Properties

Best Practices

Ellipse

Description

When to Use

Drawing

Editing

Properties

Best Practices

Cuboid

Description

When to Use

Drawing Methods

Structure

Editing

Properties

Best Practices

Skeleton

Description

When to Use

Setup

Drawing

Editing

Properties

Common Skeleton Formats

Best Practices

Mask

Description

When to Use

Drawing

Brush Settings

Editing

Properties

Performance

Best Practices

Tag

Description

When to Use

Creating Tags

Properties

Tag Annotation Workspace

Best Practices

Object Types: Shape vs Track

Shape

Track

Track Keyframes

Choosing the Right Annotation Type

Export Formats

Next Steps