Skip to main content

Validation

The validation module provides comprehensive dataset validation, checking for structural integrity, data quality, and geometric validity.

Overview

Validation examines your dataset for:
  • Structural issues: Duplicate IDs, invalid references
  • Data quality: Empty names, invalid dimensions
  • Geometric validity: Proper bounding boxes, within image bounds
Each issue is classified as either an error or warning.

validate_dataset

The main validation function:
pub fn validate_dataset(
    dataset: &Dataset,
    opts: &ValidateOptions,
) -> ValidationReport
Example:
use panlabel::validation::{validate_dataset, ValidateOptions};
use panlabel::ir::io_coco_json;
use std::path::Path;

// Read dataset
let dataset = io_coco_json::read_coco_json(Path::new("data.json"))?;

// Validate with default options
let opts = ValidateOptions { strict: false };
let report = validate_dataset(&dataset, &opts);

// Check results
if report.error_count() > 0 {
    eprintln!("Validation errors found:");
    eprintln!("{}", report);
} else if report.warning_count() > 0 {
    println!("Validation warnings:");
    println!("{}", report);
} else {
    println!("✓ Dataset is valid");
}

ValidateOptions

Options controlling validation behavior:
pub struct ValidateOptions {
    /// If true, treat warnings as errors
    pub strict: bool,
}
Strict mode:
// Strict validation - warnings become errors
let opts = ValidateOptions { strict: true };
let report = validate_dataset(&dataset, &opts);

if report.error_count() > 0 || report.warning_count() > 0 {
    return Err("Validation failed".into());
}

ValidationReport

The validation report contains all issues found:
pub struct ValidationReport {
    pub issues: Vec<ValidationIssue>,
}
Methods:
impl ValidationReport {
    /// Returns true if no issues were found
    pub fn is_clean(&self) -> bool;
    
    /// Count of error-level issues
    pub fn error_count(&self) -> usize;
    
    /// Count of warning-level issues
    pub fn warning_count(&self) -> usize;
    
    /// Filter issues by severity
    pub fn errors(&self) -> impl Iterator<Item = &ValidationIssue>;
    pub fn warnings(&self) -> impl Iterator<Item = &ValidationIssue>;
}
Display: The report implements Display for human-readable output:
println!("{}", report);
Output:
Validation Report

✗ Errors (3):
  [Image 1] Duplicate image ID 1 (first seen at index 0)
  [Annotation 5] References non-existent image 999
  [Annotation 7] Bounding box (650.0, 490.0, 700.0, 550.0) extends outside image bounds (0, 0, 640, 480)

⚠ Warnings (2):
  [Category 2] Duplicate category name 'person' (also used by category 1)
  [Image 3] Empty filename

Summary: 3 errors, 2 warnings

ValidationIssue

Individual validation issues:
pub struct ValidationIssue {
    pub severity: Severity,
    pub code: IssueCode,
    pub message: String,
    pub context: IssueContext,
}

pub enum Severity {
    Error,
    Warning,
}

Issue Codes

Stable, machine-readable codes for each validation check:
pub enum IssueCode {
    // Image issues
    DuplicateImageId,
    InvalidImageDimensions,
    EmptyFileName,
    
    // Category issues
    DuplicateCategoryId,
    DuplicateCategoryName,
    EmptyCategoryName,
    
    // Annotation issues
    DuplicateAnnotationId,
    MissingImageRef,
    MissingCategoryRef,
    
    // Bounding box issues
    BBoxNotFinite,
    InvalidBBoxOrdering,
    InvalidBBoxArea,
    BBoxOutOfBounds,
}

Issue Context

Context about where the issue occurred:
pub enum IssueContext {
    Image { id: u64 },
    Category { id: u64 },
    Annotation { id: u64 },
}

Validation Checks

Image Validation

Duplicate IDs:
// ✗ Error: Multiple images with same ID
Dataset {
    images: vec![
        Image::new(1u64, "img1.jpg", 640, 480),
        Image::new(1u64, "img2.jpg", 640, 480), // Duplicate ID 1
    ],
    ...
}
Invalid dimensions:
// ✗ Error: Zero or negative dimensions
Image::new(1u64, "img.jpg", 0, 480)  // Width is 0
Empty filename:
// ⚠ Warning: Empty filename
Image::new(1u64, "", 640, 480)

Category Validation

Duplicate IDs:
// ✗ Error: Multiple categories with same ID
Dataset {
    categories: vec![
        Category::new(1u64, "person"),
        Category::new(1u64, "car"), // Duplicate ID 1
    ],
    ...
}
Duplicate names:
// ⚠ Warning: Multiple categories with same name
Dataset {
    categories: vec![
        Category::new(1u64, "person"),
        Category::new(2u64, "person"), // Duplicate name
    ],
    ...
}
Empty name:
// ⚠ Warning: Empty category name
Category::new(1u64, "")

Annotation Validation

Duplicate IDs:
// ✗ Error: Multiple annotations with same ID
Dataset {
    annotations: vec![
        Annotation::new(1u64, 1u64, 1u64, bbox1),
        Annotation::new(1u64, 1u64, 1u64, bbox2), // Duplicate ID 1
    ],
    ...
}
Invalid references:
// ✗ Error: References non-existent image
Annotation::new(
    1u64,
    999u64, // Image 999 doesn't exist
    1u64,
    bbox,
)

// ✗ Error: References non-existent category
Annotation::new(
    1u64,
    1u64,
    999u64, // Category 999 doesn't exist
    bbox,
)

Bounding Box Validation

Non-finite coordinates:
// ✗ Error: NaN or infinite coordinates
let bbox = BBoxXYXY::<Pixel>::from_xyxy(
    f64::NAN,
    20.0,
    100.0,
    200.0,
);
Invalid ordering:
// ✗ Error: max < min
let bbox = BBoxXYXY::<Pixel>::from_xyxy(
    100.0,  // xmin
    20.0,   // ymin
    10.0,   // xmax < xmin
    200.0,  // ymax
);
Zero or negative area:
// ⚠ Warning: Zero area (point box)
let bbox = BBoxXYXY::<Pixel>::from_xyxy(
    10.0, 20.0, 10.0, 20.0
);
Out of bounds:
// ✗ Error: Extends outside image dimensions
let image = Image::new(1u64, "img.jpg", 640, 480);
let bbox = BBoxXYXY::<Pixel>::from_xyxy(
    600.0,
    400.0,
    800.0,  // > image width (640)
    600.0,  // > image height (480)
);
Note: Validation uses a 0.5 pixel tolerance for floating-point precision.

Usage Patterns

Validate After Reading

use panlabel::validation::{validate_dataset, ValidateOptions};
use panlabel::ir::io_coco_json;

let dataset = io_coco_json::read_coco_json(path)?;

let opts = ValidateOptions { strict: false };
let report = validate_dataset(&dataset, &opts);

if !report.is_clean() {
    eprintln!("Validation issues found:");
    eprintln!("{}", report);
    
    if report.error_count() > 0 {
        return Err("Dataset has errors".into());
    }
}

Validate Before Writing

use panlabel::validation::{validate_dataset, ValidateOptions};

let opts = ValidateOptions { strict: true };
let report = validate_dataset(&dataset, &opts);

if report.error_count() > 0 || report.warning_count() > 0 {
    return Err("Dataset validation failed".into());
}

// Safe to write
io_yolo::write_yolo_dir(output_path, &dataset)?;

Filter Issues by Type

let report = validate_dataset(&dataset, &ValidateOptions::default());

// Only show errors
for issue in report.errors() {
    eprintln!("Error: {}", issue.message);
}

// Count specific issue types
use panlabel::validation::IssueCode;

let bbox_errors = report.issues.iter()
    .filter(|i| matches!(i.code, IssueCode::BBoxOutOfBounds))
    .count();

println!("{} bounding boxes are out of bounds", bbox_errors);

JSON Export

The report is serializable for programmatic use:
let report = validate_dataset(&dataset, &ValidateOptions::default());
let json = serde_json::to_string_pretty(&report.as_json())?;
println!("{}", json);

Complete Example

use panlabel::ir::*;
use panlabel::validation::*;
use panlabel::PanlabelError;
use std::path::Path;

fn validate_and_fix(
    input_path: &Path,
) -> Result<Dataset, PanlabelError> {
    // Read dataset
    let dataset = io_coco_json::read_coco_json(input_path)?;
    
    // Validate
    let opts = ValidateOptions { strict: false };
    let report = validate_dataset(&dataset, &opts);
    
    // Display report
    if !report.is_clean() {
        println!("Validation issues found:");
        println!("{}", report);
    }
    
    // Check for critical errors
    if report.error_count() > 0 {
        eprintln!("\nCannot proceed with {} errors", report.error_count());
        return Err("Validation failed".into());
    }
    
    // Warn about non-critical issues
    if report.warning_count() > 0 {
        println!("\n⚠ Dataset has {} warnings but can be used", report.warning_count());
    } else {
        println!("\n✓ Dataset is valid");
    }
    
    Ok(dataset)
}

Next Steps

Build docs developers (and LLMs) love