Validation Workflow

Panlabel’s validation system helps you catch dataset errors before they cause problems in training or conversion.

Quick Validation

Validate any dataset with the validate command:

panlabel validate --format coco dataset.json

Output:

Validation Report:
  ✓ Dataset structure is valid
  ✓ All image references are present
  ✓ All bounding boxes are within image bounds
  ✓ No duplicate annotation IDs
  ✓ All category IDs are valid

Result: PASS (0 errors, 0 warnings)

Validation Levels

Panlabel performs several levels of checks:

Structural Checks

Valid JSON/XML/CSV syntax
Required fields present
Correct data types
Valid ID references (categories, images)

Semantic Checks

Bounding boxes within image dimensions
Non-negative coordinates and dimensions
Valid bbox format (x_min < x_max, y_min < y_max)
Category consistency
Unique IDs where required

Format-Specific Checks

YOLO: Normalized coordinates in [0, 1] range
COCO: Valid bbox [x, y, width, height] format
VOC: Valid pixel coordinates
Label Studio: Valid percentage coordinates and result structure

Understanding Validation Reports

Error vs Warning Severity

Errors indicate serious problems that will likely cause failures:

panlabel validate --format coco broken_dataset.json

Output:

Validation Report:
  ✗ Error: Annotation 42 has invalid category_id: 999 (not found in categories)
  ✗ Error: Image 'img_123.jpg' referenced by annotation but not in images list
  ✗ Error: Bounding box in annotation 15 is outside image bounds: [150, 200, 250, 300] for 200x200 image

Result: FAIL (3 errors, 0 warnings)

Warnings indicate potential issues that may or may not cause problems:

panlabel validate --format coco dataset.json

Output:

Validation Report:
  ⚠ Warning: Annotation 10 has area 0 (degenerate bbox)
  ⚠ Warning: Image 'empty.jpg' has no annotations
  ⚠ Warning: Category 'background' has no annotations

Result: PASS (0 errors, 3 warnings)

Strict Mode

Treat warnings as errors for stricter validation:

panlabel validate --format coco --strict dataset.json

With --strict, the validation fails if there are any warnings, not just errors. This is useful for:

Quality-gated CI/CD pipelines
Ensuring high-quality training datasets
Preventing edge cases in production

Example Output:

Validation Report:
  ⚠ Warning: Image 'empty.jpg' has no annotations
  ⚠ Warning: Annotation 10 has area 0 (degenerate bbox)

Result: FAIL in strict mode (0 errors, 2 warnings)

Use --strict during dataset preparation and curation. For production pipelines processing diverse inputs, standard validation may be more appropriate.

Validation in Conversion

By default, the convert command validates input before converting:

panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out

If validation fails, the conversion is aborted:

Validation Report:
  ✗ Error: Annotation 42 has invalid category_id: 999

Error: Validation failed (1 error, 0 warnings)
Conversion aborted.

Skip Validation (Not Recommended)

For trusted inputs where validation is a performance bottleneck:

panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out \
  --no-validate

Using --no-validate may result in corrupted output if your input has errors. Only use this for datasets you’ve already validated or generated programmatically with strong guarantees.

Strict Validation in Conversion

Combine --strict with conversion to enforce zero warnings:

panlabel convert --from coco --to yolo \
  --input dataset.json \
  --output ./yolo_out \
  --strict \
  --allow-lossy

JSON Output for Automation

Get machine-readable validation results:

panlabel validate --format coco --output json dataset.json

Output:

{
  "status": "fail",
  "errors": [
    {
      "severity": "error",
      "code": "invalid_category_id",
      "message": "Annotation 42 has invalid category_id: 999 (not found in categories)"
    }
  ],
  "warnings": [
    {
      "severity": "warning",
      "code": "empty_image",
      "message": "Image 'empty.jpg' has no annotations"
    }
  ],
  "summary": {
    "total_errors": 1,
    "total_warnings": 1
  }
}

This enables automated quality checks in CI/CD:

import subprocess
import json
import sys

result = subprocess.run([
    'panlabel', 'validate',
    '--format', 'coco',
    '--output', 'json',
    'dataset.json'
], capture_output=True, text=True)

report = json.loads(result.stdout)

if report['status'] == 'fail':
    print(f"Validation failed: {report['summary']['total_errors']} errors")
    sys.exit(1)

print("Validation passed!")

Common Validation Issues

Invalid category references

Problem: Annotations reference category IDs that don’t exist in the dataset’s category list.

Error: Annotation 42 has invalid category_id: 999 (not found in categories)

Fix: Ensure all category IDs in annotations match IDs in your categories list. For COCO format:

{
  "categories": [
    {"id": 1, "name": "person"},
    {"id": 2, "name": "car"}
  ],
  "annotations": [
    {"id": 1, "category_id": 1, ...},  // ✓ Valid
    {"id": 2, "category_id": 999, ...}  // ✗ Invalid - no category 999
  ]
}

Out-of-bounds bounding boxes

Problem: Bounding box coordinates extend beyond image dimensions.

Error: Bounding box in annotation 15 is outside image bounds: [150, 200, 250, 300] for 200x200 image

Fix: Clip bounding boxes to image dimensions or fix annotation tools that produced invalid coordinates.For COCO format, ensure x + width <= image_width and y + height <= image_height.

Missing image references

Problem: Annotations reference images that don’t exist in the dataset.

Error: Image 'img_123.jpg' referenced by annotation but not in images list

Fix: Ensure every annotation’s image_id matches an entry in the images list. This often happens when filtering or splitting datasets.

Degenerate bounding boxes

Problem: Bounding boxes with zero area (width or height is 0).

Warning: Annotation 10 has area 0 (degenerate bbox)

Fix: Remove or fix these annotations. They typically indicate labeling errors or incorrect coordinate calculations. For COCO format:

// Bad: zero width or height
{"bbox": [100, 100, 0, 50], ...}  // width = 0
{"bbox": [100, 100, 50, 0], ...}  // height = 0

// Good: positive dimensions
{"bbox": [100, 100, 50, 50], ...}

Inverted coordinates

Problem: Minimum coordinate greater than maximum (xmin > xmax or ymin > ymax).

Error: Invalid bbox: x_min (200) > x_max (100)

Fix: Swap coordinate pairs to ensure min < max. This usually happens with manual coordinate entry or incorrect coordinate system conversions.

Duplicate IDs

Problem: Multiple annotations, images, or categories share the same ID.

Error: Duplicate annotation ID: 42 appears 2 times

Fix: Ensure all IDs within each collection (annotations, images, categories) are unique. When merging datasets, reassign IDs to avoid conflicts.

Images without annotations

Problem: Images in the dataset have no associated annotations.

Warning: Image 'empty.jpg' has no annotations

Fix: This is usually not an error - negative examples (images with no objects) can be valuable for training. However, if unexpected:

Verify annotations weren’t accidentally lost during processing
Check if this is an incomplete labeling job
Use --strict mode if you want to enforce that all images must have annotations

Some formats like TFOD automatically drop images without annotations during conversion.

YOLO: Coordinates out of [0,1] range

Problem: YOLO format uses normalized coordinates, but values are outside valid range.

Error: YOLO coordinate out of range: center_x = 1.5 (expected 0.0-1.0)

Fix: YOLO coordinates should be normalized by image dimensions:

# Bad: absolute pixel coordinates
0 150 200 50 75  # ✗ Invalid

# Good: normalized to [0, 1]
0 0.375 0.500 0.125 0.187  # ✓ Valid

Best Practices

Validate Early and Often

Run validation:

After data collection/labeling
Before training
After format conversions
Before publishing datasets

Use Validation in CI/CD

# .github/workflows/validate-dataset.yml
name: Validate Dataset

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: anomalyco/panlabel-action@v1
      - run: panlabel validate --format coco --strict annotations.json

Combine with Stats for Quality Insights

After validation passes, analyze dataset quality:

# Validate first
panlabel validate --format coco dataset.json

# Then get statistics
panlabel stats --format coco dataset.json

This workflow helps identify not just errors, but also dataset imbalances and quality issues.

Keep Validation Logs

For large datasets, save validation reports for auditing:

panlabel validate --format coco --output json dataset.json > validation_report.json

convert - Convert formats with built-in validation
stats - Analyze dataset quality after validation
diff - Compare datasets to track changes

Get Started

CLI Commands

Guides

Format Reference

Advanced

Validation Workflow

Quick Validation

Validation Levels

Structural Checks

Semantic Checks

Format-Specific Checks

Understanding Validation Reports

Error vs Warning Severity

Strict Mode

Validation in Conversion

Skip Validation (Not Recommended)

Strict Validation in Conversion

JSON Output for Automation

Common Validation Issues

Best Practices

Validate Early and Often

Use Validation in CI/CD

Combine with Stats for Quality Insights

Keep Validation Logs

Build docs developers (and LLMs) love

Get Started

CLI Commands

Guides

Format Reference

Advanced

​Quick Validation

​Validation Levels

​Structural Checks

​Semantic Checks

​Format-Specific Checks

​Understanding Validation Reports

​Error vs Warning Severity

​Strict Mode

​Validation in Conversion

​Skip Validation (Not Recommended)

​Strict Validation in Conversion

​JSON Output for Automation

​Common Validation Issues

​Best Practices

​Validate Early and Often

​Use Validation in CI/CD

​Combine with Stats for Quality Insights

​Keep Validation Logs

​Related Commands

Build docs developers (and LLMs) love

Quick Validation

Validation Levels

Structural Checks

Semantic Checks

Format-Specific Checks

Understanding Validation Reports

Error vs Warning Severity

Strict Mode

Validation in Conversion

Skip Validation (Not Recommended)

Strict Validation in Conversion

JSON Output for Automation

Common Validation Issues

Best Practices

Validate Early and Often

Use Validation in CI/CD

Combine with Stats for Quality Insights

Keep Validation Logs

Related Commands