Quick Validation
Validate any dataset with thevalidate command:
Validation Levels
Panlabel performs several levels of checks:Structural Checks
- Valid JSON/XML/CSV syntax
- Required fields present
- Correct data types
- Valid ID references (categories, images)
Semantic Checks
- Bounding boxes within image dimensions
- Non-negative coordinates and dimensions
- Valid bbox format (x_min < x_max, y_min < y_max)
- Category consistency
- Unique IDs where required
Format-Specific Checks
- YOLO: Normalized coordinates in [0, 1] range
- COCO: Valid bbox
[x, y, width, height]format - VOC: Valid pixel coordinates
- Label Studio: Valid percentage coordinates and result structure
Understanding Validation Reports
Error vs Warning Severity
Errors indicate serious problems that will likely cause failures:Strict Mode
Treat warnings as errors for stricter validation:--strict, the validation fails if there are any warnings, not just errors. This is useful for:
- Quality-gated CI/CD pipelines
- Ensuring high-quality training datasets
- Preventing edge cases in production
Validation in Conversion
By default, theconvert command validates input before converting:
Skip Validation (Not Recommended)
For trusted inputs where validation is a performance bottleneck:Strict Validation in Conversion
Combine--strict with conversion to enforce zero warnings:
JSON Output for Automation
Get machine-readable validation results:Common Validation Issues
Invalid category references
Invalid category references
Problem: Annotations reference category IDs that don’t exist in the dataset’s category list.Fix: Ensure all category IDs in annotations match IDs in your categories list. For COCO format:
Out-of-bounds bounding boxes
Out-of-bounds bounding boxes
Problem: Bounding box coordinates extend beyond image dimensions.Fix: Clip bounding boxes to image dimensions or fix annotation tools that produced invalid coordinates.For COCO format, ensure
x + width <= image_width and y + height <= image_height.Missing image references
Missing image references
Problem: Annotations reference images that don’t exist in the dataset.Fix: Ensure every annotation’s
image_id matches an entry in the images list. This often happens when filtering or splitting datasets.Degenerate bounding boxes
Degenerate bounding boxes
Problem: Bounding boxes with zero area (width or height is 0).Fix: Remove or fix these annotations. They typically indicate labeling errors or incorrect coordinate calculations. For COCO format:
Inverted coordinates
Inverted coordinates
Problem: Minimum coordinate greater than maximum (xmin > xmax or ymin > ymax).Fix: Swap coordinate pairs to ensure min < max. This usually happens with manual coordinate entry or incorrect coordinate system conversions.
Duplicate IDs
Duplicate IDs
Problem: Multiple annotations, images, or categories share the same ID.Fix: Ensure all IDs within each collection (annotations, images, categories) are unique. When merging datasets, reassign IDs to avoid conflicts.
Images without annotations
Images without annotations
Problem: Images in the dataset have no associated annotations.Fix: This is usually not an error - negative examples (images with no objects) can be valuable for training. However, if unexpected:
- Verify annotations weren’t accidentally lost during processing
- Check if this is an incomplete labeling job
- Use
--strictmode if you want to enforce that all images must have annotations
Some formats like TFOD automatically drop images without annotations during conversion.
YOLO: Coordinates out of [0,1] range
YOLO: Coordinates out of [0,1] range
Problem: YOLO format uses normalized coordinates, but values are outside valid range.Fix: YOLO coordinates should be normalized by image dimensions:
Best Practices
Validate Early and Often
Run validation:- After data collection/labeling
- Before training
- After format conversions
- Before publishing datasets