Skip to main content
Panlabel’s validation system helps you catch dataset errors before they cause problems in training or conversion.

Quick Validation

Validate any dataset with the validate command:
panlabel validate --format coco dataset.json
Output:
Validation Report:
  ✓ Dataset structure is valid
  ✓ All image references are present
  ✓ All bounding boxes are within image bounds
  ✓ No duplicate annotation IDs
  ✓ All category IDs are valid

Result: PASS (0 errors, 0 warnings)

Validation Levels

Panlabel performs several levels of checks:

Structural Checks

  • Valid JSON/XML/CSV syntax
  • Required fields present
  • Correct data types
  • Valid ID references (categories, images)

Semantic Checks

  • Bounding boxes within image dimensions
  • Non-negative coordinates and dimensions
  • Valid bbox format (x_min < x_max, y_min < y_max)
  • Category consistency
  • Unique IDs where required

Format-Specific Checks

  • YOLO: Normalized coordinates in [0, 1] range
  • COCO: Valid bbox [x, y, width, height] format
  • VOC: Valid pixel coordinates
  • Label Studio: Valid percentage coordinates and result structure

Understanding Validation Reports

Error vs Warning Severity

Errors indicate serious problems that will likely cause failures:
panlabel validate --format coco broken_dataset.json
Output:
Validation Report:
  ✗ Error: Annotation 42 has invalid category_id: 999 (not found in categories)
  ✗ Error: Image 'img_123.jpg' referenced by annotation but not in images list
  ✗ Error: Bounding box in annotation 15 is outside image bounds: [150, 200, 250, 300] for 200x200 image

Result: FAIL (3 errors, 0 warnings)
Warnings indicate potential issues that may or may not cause problems:
panlabel validate --format coco dataset.json
Output:
Validation Report:
  ⚠ Warning: Annotation 10 has area 0 (degenerate bbox)
  ⚠ Warning: Image 'empty.jpg' has no annotations
  ⚠ Warning: Category 'background' has no annotations

Result: PASS (0 errors, 3 warnings)

Strict Mode

Treat warnings as errors for stricter validation:
panlabel validate --format coco --strict dataset.json
With --strict, the validation fails if there are any warnings, not just errors. This is useful for:
  • Quality-gated CI/CD pipelines
  • Ensuring high-quality training datasets
  • Preventing edge cases in production
Example Output:
Validation Report:
  ⚠ Warning: Image 'empty.jpg' has no annotations
  ⚠ Warning: Annotation 10 has area 0 (degenerate bbox)

Result: FAIL in strict mode (0 errors, 2 warnings)
Use --strict during dataset preparation and curation. For production pipelines processing diverse inputs, standard validation may be more appropriate.

Validation in Conversion

By default, the convert command validates input before converting:
panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out
If validation fails, the conversion is aborted:
Validation Report:
  ✗ Error: Annotation 42 has invalid category_id: 999

Error: Validation failed (1 error, 0 warnings)
Conversion aborted.
For trusted inputs where validation is a performance bottleneck:
panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out \
  --no-validate
Using --no-validate may result in corrupted output if your input has errors. Only use this for datasets you’ve already validated or generated programmatically with strong guarantees.

Strict Validation in Conversion

Combine --strict with conversion to enforce zero warnings:
panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out \
  --strict \
  --allow-lossy

JSON Output for Automation

Get machine-readable validation results:
panlabel validate --format coco --output json dataset.json
Output:
{
  "status": "fail",
  "errors": [
    {
      "severity": "error",
      "code": "invalid_category_id",
      "message": "Annotation 42 has invalid category_id: 999 (not found in categories)"
    }
  ],
  "warnings": [
    {
      "severity": "warning",
      "code": "empty_image",
      "message": "Image 'empty.jpg' has no annotations"
    }
  ],
  "summary": {
    "total_errors": 1,
    "total_warnings": 1
  }
}
This enables automated quality checks in CI/CD:
import subprocess
import json
import sys

result = subprocess.run([
    'panlabel', 'validate',
    '--format', 'coco',
    '--output', 'json',
    'dataset.json'
], capture_output=True, text=True)

report = json.loads(result.stdout)

if report['status'] == 'fail':
    print(f"Validation failed: {report['summary']['total_errors']} errors")
    sys.exit(1)

print("Validation passed!")

Common Validation Issues

Problem: Annotations reference category IDs that don’t exist in the dataset’s category list.
Error: Annotation 42 has invalid category_id: 999 (not found in categories)
Fix: Ensure all category IDs in annotations match IDs in your categories list. For COCO format:
{
  "categories": [
    {"id": 1, "name": "person"},
    {"id": 2, "name": "car"}
  ],
  "annotations": [
    {"id": 1, "category_id": 1, ...},  // ✓ Valid
    {"id": 2, "category_id": 999, ...}  // ✗ Invalid - no category 999
  ]
}
Problem: Bounding box coordinates extend beyond image dimensions.
Error: Bounding box in annotation 15 is outside image bounds: [150, 200, 250, 300] for 200x200 image
Fix: Clip bounding boxes to image dimensions or fix annotation tools that produced invalid coordinates.For COCO format, ensure x + width <= image_width and y + height <= image_height.
Problem: Annotations reference images that don’t exist in the dataset.
Error: Image 'img_123.jpg' referenced by annotation but not in images list
Fix: Ensure every annotation’s image_id matches an entry in the images list. This often happens when filtering or splitting datasets.
Problem: Bounding boxes with zero area (width or height is 0).
Warning: Annotation 10 has area 0 (degenerate bbox)
Fix: Remove or fix these annotations. They typically indicate labeling errors or incorrect coordinate calculations. For COCO format:
// Bad: zero width or height
{"bbox": [100, 100, 0, 50], ...}  // width = 0
{"bbox": [100, 100, 50, 0], ...}  // height = 0

// Good: positive dimensions
{"bbox": [100, 100, 50, 50], ...}
Problem: Minimum coordinate greater than maximum (xmin > xmax or ymin > ymax).
Error: Invalid bbox: x_min (200) > x_max (100)
Fix: Swap coordinate pairs to ensure min < max. This usually happens with manual coordinate entry or incorrect coordinate system conversions.
Problem: Multiple annotations, images, or categories share the same ID.
Error: Duplicate annotation ID: 42 appears 2 times
Fix: Ensure all IDs within each collection (annotations, images, categories) are unique. When merging datasets, reassign IDs to avoid conflicts.
Problem: Images in the dataset have no associated annotations.
Warning: Image 'empty.jpg' has no annotations
Fix: This is usually not an error - negative examples (images with no objects) can be valuable for training. However, if unexpected:
  • Verify annotations weren’t accidentally lost during processing
  • Check if this is an incomplete labeling job
  • Use --strict mode if you want to enforce that all images must have annotations
Some formats like TFOD automatically drop images without annotations during conversion.
Problem: YOLO format uses normalized coordinates, but values are outside valid range.
Error: YOLO coordinate out of range: center_x = 1.5 (expected 0.0-1.0)
Fix: YOLO coordinates should be normalized by image dimensions:
# Bad: absolute pixel coordinates
0 150 200 50 75  # ✗ Invalid

# Good: normalized to [0, 1]
0 0.375 0.500 0.125 0.187  # ✓ Valid

Best Practices

Validate Early and Often

Run validation:
  • After data collection/labeling
  • Before training
  • After format conversions
  • Before publishing datasets

Use Validation in CI/CD

# .github/workflows/validate-dataset.yml
name: Validate Dataset

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: anomalyco/panlabel-action@v1
      - run: panlabel validate --format coco --strict annotations.json

Combine with Stats for Quality Insights

After validation passes, analyze dataset quality:
# Validate first
panlabel validate --format coco dataset.json

# Then get statistics
panlabel stats --format coco dataset.json
This workflow helps identify not just errors, but also dataset imbalances and quality issues.

Keep Validation Logs

For large datasets, save validation reports for auditing:
panlabel validate --format coco --output json dataset.json > validation_report.json
  • convert - Convert formats with built-in validation
  • stats - Analyze dataset quality after validation
  • diff - Compare datasets to track changes

Build docs developers (and LLMs) love