Annotation Types Overview
Rectangle
Axis-aligned bounding boxes
Polygon
Arbitrary closed shapes
Polyline
Open line segments
Points
Individual point markers
Ellipse
Elliptical shapes
Cuboid
3D boxes (2D projection)
Skeleton
Articulated keypoint structures
Mask
Pixel-accurate segmentation
Tag
Frame-level labels
Rectangle (Bounding Box)
Description
Rectangles are axis-aligned bounding boxes defined by opposite corners. They’re the fastest annotation type and are widely used for object detection.When to Use
- Object detection tasks
- When exact boundaries aren’t critical
- Quick initial annotation
- Objects with roughly rectangular extent
- YOLO, Faster R-CNN, and similar detector training
Drawing Methods
Method 1: Classic (2 Points)- Activate rectangle tool (
Shift+B) - Click and drag from one corner to the opposite corner
- Release to finish
- Select “By 4 Points” in the tool options
- Click the four extreme points:
- Topmost point of the object
- Bottommost point
- Leftmost point
- Rightmost point
- CVAT fits a bounding box automatically
Properties
- Coordinates:
(x1, y1, x2, y2)where (x1, y1) is top-left and (x2, y2) is bottom-right - Rotation: Can be rotated after creation (not axis-aligned anymore)
- Attributes: Supports all attribute types
- In tracks: Interpolates position and size
Best Practices
Tight Fit
Keep minimal margin around the object. The box should touch the object at the extreme points.
Include Full Object
Don’t cut off any part of the object, even if partially occluded (mark as occluded instead).
Consistency
Maintain consistent margin across all instances of the same class.
Overlap Handling
Use Z-order to indicate which object is in front when boxes overlap.
Polygon
Description
Polygons are closed shapes defined by a sequence of vertices. They provide precise object boundaries for irregular shapes.When to Use
- Instance segmentation tasks
- Objects with irregular boundaries
- Precise boundary annotation required
- Semantic segmentation with discrete objects
- Mask R-CNN and similar model training
Drawing
- Activate polygon tool (
Shift+P) - Click to place each vertex along the object boundary
- Place vertices at corners and direction changes
- Close the polygon:
- Double-click the first vertex, or
- Press
N, or - Click the first point again
Editing
Add Vertex:- Select the polygon
- Click on an edge where you want to insert a point
- A new vertex is created
- Select the polygon
- Drag any vertex to reposition it
- Select the polygon
- Right-click a vertex → “Delete point”, or
- Select vertex and press
Del
- Right-click edge → Add point
- Right-click vertex → Delete point
Properties
- Coordinates: Array of
(x, y)vertices - Minimum points: 3 (triangle)
- Attributes: Supports all attribute types
- In tracks: Interpolates vertex positions (can morph shape)
Best Practices
- Zoom in for pixel-accurate boundaries
- Follow edges precisely, especially for training segmentation models
- Close gaps carefully at occlusion boundaries
- Optimize points: Use enough for accuracy, not excessive
- Clockwise/counterclockwise: Maintain consistent direction for all objects
Polyline
Description
Polylines are open line segments (not closed). They represent linear features without interior regions.When to Use
- Lane markings
- Trajectories and paths
- Borders and boundaries
- Linear infrastructure (roads, power lines)
- Centerlines
- Motion paths
Drawing
- Activate polyline tool (
Shift+L) - Click to place vertices along the line
- Finish the line:
- Press
N, or - Double-click the last point
- Press
Editing
Same as polygons:- Add/remove vertices
- Drag vertices to reposition
- Extend or shorten the line
Properties
- Coordinates: Array of
(x, y)vertices (ordered) - Minimum points: 2 (line segment)
- Open shape: No interior, just the path
- In tracks: Interpolates as a flexible path
Best Practices
- Place vertices at direction changes and curves
- For lanes, annotate the center line unless specified otherwise
- Maintain consistent direction (e.g., always left-to-right)
- Extend to frame boundaries if the feature continues beyond the visible area
Polylines are especially useful for autonomous driving datasets where lane detection is critical.
Points
Description
Points are individual point markers used for small objects, centers, or multiple instance counting.When to Use
- Very small objects (< 5 pixels)
- Object centers or key points (single point per object)
- Counting (e.g., cells, people in crowds)
- Sparse annotation
- Click-based detection
Drawing
- Activate points tool (
Shift+.) - Optionally set the number of points to place
- Click to place each point
- Press
Nafter the last point (or auto-completes after specified count)
Multiple Points in One Annotation
Points annotations can contain multiple points:- Each click adds a point to the same annotation
- Useful for objects with multiple markers
- All points share the same label and attributes
Properties
- Coordinates: Array of
(x, y)positions - Minimum points: 1
- In tracks: Interpolates point positions
Best Practices
Center Placement
When marking object centers, be consistent (geometric center, visual center, etc.)
Zoom In
Always zoom in for pixel-accurate point placement
Single vs Multiple
Use single-point annotations for discrete objects, multi-point for objects with multiple markers
Visibility
Points can be hard to see - increase point size in settings
Ellipse
Description
Ellipses are oval shapes defined by center and two radii. They provide a middle ground between rectangles and polygons.When to Use
- Circular or oval objects (balls, wheels, heads)
- When rotation matters
- Faster than polygons, more accurate than rectangles
- Objects with elliptical cross-sections
Drawing
- Activate ellipse tool
- Click and drag to define center and initial size
- Move mouse to adjust shape and rotation
- Click to finish
Editing
- Drag center: Move the ellipse
- Drag edge handles: Resize radii
- Rotation handles: Rotate the ellipse
Properties
- Coordinates: Center
(cx, cy), radii(rx, ry), and rotation angle - Shape: Always elliptical
- In tracks: Interpolates position, size, and rotation
Best Practices
- Fit tightly to the object boundary
- Use for circular objects where rectangles waste space
- Consider rotation for tilted objects
- Alternative to polygons when approximation is acceptable
Cuboid
Description
Cuboids are 3D boxes rendered in 2D as wire frames. They represent objects with depth in monocular images.When to Use
- Vehicles (cars, trucks, buses)
- Buildings and structures
- Furniture
- Packages and boxes
- Any object where 3D extent matters
- Monocular 3D object detection
Drawing Methods
Method 1: From Rectangle- Activate cuboid tool
- Draw the front face as a rectangle
- Drag to extend the depth (rear face)
- Adjust perspectives and edges
- Select “By 4 Points” method
- Click four corners that define the visible faces
- CVAT constructs the 3D cuboid
Structure
A cuboid consists of:- Front face: 4 points
- Rear face: 4 points
- Edges: Connecting lines
Editing
- Drag faces: Move front or rear face
- Drag vertices: Adjust individual corners
- Drag edges: Adjust perspective
- Whole cuboid: Drag center to move entire object
Properties
- Coordinates: 8 vertices
(x, y)for each corner - Projection: Perspective projection of 3D box onto 2D image
- Attributes: Supports dimensions (width, height, length)
- In tracks: Interpolates all 8 vertices
Best Practices
Perspective Consistency
Perspective Consistency
Ensure the cuboid follows the image perspective. Rear face should be smaller if farther away.
Visible Faces
Visible Faces
Adjust vertices to match visible edges. Not all edges may be visible depending on viewpoint.
Dimension Attributes
Dimension Attributes
Use width/height/length attributes to record physical dimensions when known.
Orientation
Orientation
Maintain consistent front/rear face assignment (e.g., front = vehicle front).
Cuboids are particularly valuable for autonomous driving datasets where 3D understanding is critical for planning and control.
Skeleton
Description
Skeletons are articulated structures composed of keypoints (nodes) connected by edges. They represent objects with defined structural relationships.When to Use
- Human pose estimation
- Animal pose
- Articulated objects (robots, machinery)
- Hand keypoints
- Facial landmarks
- Any structured keypoint annotation
Setup
Skeletons require label configuration with:- Keypoints: List of named points (e.g., “nose”, “left_eye”, “right_shoulder”)
- Edges: Connections between keypoints (e.g., “left_shoulder” to “left_elbow”)
Drawing
- Activate skeleton tool
- Select the skeleton label
- Click to place each keypoint in order:
- Follow the defined keypoint sequence
- Place each point at the anatomical location
- CVAT automatically draws edges between connected keypoints
Editing
- Move keypoints: Drag any keypoint to reposition
- Visibility states: Mark keypoints as visible/occluded/not visible
- Add attributes: Each keypoint can have visibility attributes
- Edges: Automatically update when keypoints move
Properties
- Structure: N keypoints with M edges
- Keypoint states: Visible, occluded, outside frame
- Coordinates:
(x, y)for each keypoint - In tracks: Interpolates keypoint positions
- Topology: Fixed edge structure defined in label
Common Skeleton Formats
COCO Keypoints (17 points):- Full body pose with major joints
- Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles
- Similar to COCO, slight variations
- Wrist + 4 fingers × 5 joints each
- Facial landmarks for expression analysis
Best Practices
Consistent Order
Always place keypoints in the defined order for consistency
Occluded Points
Mark occluded keypoints appropriately, don’t skip them
Symmetric Placement
For symmetric objects (humans), maintain left/right consistency
Zoom for Accuracy
Especially important for distant or small skeletons
Mask
Description
Masks are pixel-accurate binary segmentation masks. They represent objects at the highest level of detail.When to Use
- Pixel-perfect segmentation required
- Complex, irregular boundaries
- Objects with holes or intricate details
- Semantic segmentation
- Medical imaging
- High-precision applications
Drawing
Masks use a specialized brush toolbox:- Activate mask tool
- Select a label
- The brush toolbox appears with tools:
Shift+1):
- Paint to add mask pixels
- Adjustable size and shape (circle/square)
- Click and drag to paint
Shift+2):
- Remove mask pixels
- Same size/shape controls as brush
Shift+3):
- Add a polygonal region to the mask
- Click vertices, close polygon
- Region is filled and added to mask
Shift+4):
- Remove a polygonal region from the mask
- Useful for quickly removing large areas
Brush Settings
- Size: 1-100 pixels
- Form: Circle or square
- Shortcuts:
[- Decrease brush size]- Increase brush size
Editing
- Select the mask annotation
- The brush toolbox reappears
- Use brush/eraser to modify
- Click “Done” or press
Nto finish editing
Properties
- Storage: Run-length encoded (RLE) for efficiency
- Resolution: Pixel-level
- Binary: Each pixel is either in or out of the mask
- In tracks: Each keyframe stores a full mask, interpolation is not applied (masks appear/disappear)
Performance
Masks are computationally intensive:- Large masks: May slow down the canvas
- Many masks: Consider using polygons if possible
- Video masks: Require keyframes on each annotated frame
Best Practices
Use Polygon Tools First
Use Polygon Tools First
Rough out the shape with Polygon Add, then refine with brush. This is much faster than painting everything.
Appropriate Brush Size
Appropriate Brush Size
Use large brush for interiors, small brush for edges. Adjust frequently with
[ and ].Zoom In for Edges
Zoom In for Edges
Zoom to at least 200% when painting boundary pixels for accuracy.
Save Frequently
Save Frequently
Masks can be memory-intensive; save your work regularly to avoid loss.
Consider Auto-Annotation
Consider Auto-Annotation
Use SAM2 or other interactive tools to generate initial masks, then refine manually.
Tag
Description
Tags are frame-level labels without spatial extent. They classify entire frames or indicate frame properties.When to Use
- Image classification
- Scene classification
- Frame attributes (weather, time of day)
- Event detection (frame contains event X)
- Multi-label frame classification
- Metadata annotation
Creating Tags
- Click the Tag tool in the controls sidebar
- Select a tag label
- Click Tag to apply to current frame
- The tag appears in the objects sidebar
Multiple tags can be applied to the same frame. Tags are independent of shape annotations.
Properties
- No spatial extent: Tags have no position or shape
- Frame-specific: Each tag applies to one frame
- Attributes: Tags support attributes
- Label-based: Each tag has a label (tag type)
Tag Annotation Workspace
For rapid tag annotation, switch to the Tag Annotation Workspace:- Select workspace → “Tag Annotation”
- Interface shows:
- Large image/frame display
- Tag buttons for quick application
- Keyboard shortcuts for each tag
- Navigate frames quickly with arrow keys
- Apply tags with number keys or clicks
Best Practices
- Use consistent criteria for tag application
- Document tag definitions in project guidelines
- Use attributes for tag variants (e.g., “weather” tag with “sunny”/“rainy” attribute)
- Leverage keyboard shortcuts for rapid annotation
Object Types: Shape vs Track
All annotation types (except tags) can be created as either shapes or tracks:Shape
- Exists on a single frame only
- Used for static images or one-time appearances
- No temporal continuity
- ObjectType:
SHAPE
Track
- Exists across multiple frames
- Represents the same object over time
- Supports interpolation between keyframes
- Maintains a single client ID across frames
- ObjectType:
TRACK
You can convert shapes to tracks and vice versa using the object context menu.
Track Keyframes
In tracks:- Keyframes: Frames where you manually set the object’s position/shape
- Interpolated frames: Frames between keyframes where CVAT automatically estimates position
- Mark a frame as keyframe with
K
Choosing the Right Annotation Type
- By Task
- By Accuracy Needed
- By Speed
| Task | Recommended Type |
|---|---|
| Object Detection | Rectangle |
| Instance Segmentation | Polygon or Mask |
| Semantic Segmentation | Mask |
| Lane Detection | Polyline |
| Pose Estimation | Skeleton |
| Crowd Counting | Points |
| Image Classification | Tag |
| 3D Object Detection | Cuboid |
Export Formats
Different annotation types are supported by different export formats:| Format | Rectangle | Polygon | Polyline | Points | Mask | Skeleton | Cuboid | Tag |
|---|---|---|---|---|---|---|---|---|
| COCO | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| YOLO | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Pascal VOC | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| CVAT for images | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| CVAT for video | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Cityscapes | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| KITTI | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
Next Steps
Manual Annotation
Learn workflows for creating these annotations
Advanced Tools
Master interpolation and track editing
Auto-Annotation
Generate annotations automatically with AI
Export Datasets
Export your annotations in various formats