Skip to main content
The stats module provides comprehensive dataset analysis, computing statistics about images, annotations, bounding boxes, and label distributions.

Main Function

stats_dataset

Compute a full statistics report for a dataset.
pub fn stats_dataset(
    dataset: &Dataset,
    opts: &StatsOptions
) -> StatsReport
Analyzes a dataset and produces a structured statistics report including:
  • Summary counts (images, categories, annotations)
  • Label distribution and histogram
  • Bounding box statistics (size, aspect ratio, area)
  • Image resolution distribution
  • Annotation density metrics
  • Category co-occurrence patterns
Parameters:
  • dataset - The dataset to analyze
  • opts - Statistics options (top labels, tolerance, bar width)
Returns: A StatsReport containing all computed statistics

Types

StatsOptions

Configuration for statistics computation.
pub struct StatsOptions {
    pub top_labels: usize,
    pub top_pairs: usize,
    pub oob_tolerance_px: f64,
    pub bar_width: usize,
}
Fields:
  • top_labels - Number of top labels to show in the histogram. Default: 10
  • top_pairs - Number of top co-occurrence pairs to show. Default: 10
  • oob_tolerance_px - Tolerance in pixels for out-of-bounds checks. Default: 0.5
  • bar_width - Width of histogram bars in characters (for text output). Default: 20
Default values:
impl Default for StatsOptions {
    fn default() -> Self {
        Self {
            top_labels: 10,
            top_pairs: 10,
            oob_tolerance_px: 0.5,
            bar_width: 20,
        }
    }
}

StatsReport

Comprehensive dataset statistics.
pub struct StatsReport {
    pub summary: SummarySection,
    pub labels: LabelsSection,
    pub bboxes: BBoxStats,
    pub image_resolutions: ImageResolutionStats,
    pub annotation_density: AnnotationDensityStats,
    pub area_distribution: AreaDistribution,
    pub aspect_ratios: AspectRatioDistribution,
    pub per_category_bbox: PerCategoryBBoxStats,
    pub cooccurrence_top_pairs: CooccurrenceTopPairs,
    pub bar_width: usize,
}
The report can be serialized to JSON or formatted as human-readable text/HTML.

SummarySection

High-level dataset counts.
pub struct SummarySection {
    pub total_images: usize,
    pub total_categories: usize,
    pub total_annotations: usize,
    pub images_with_annotations: usize,
    pub images_without_annotations: usize,
}

LabelsSection

Label distribution information.
pub struct LabelsSection {
    pub label_counts: Vec<LabelCount>,
}

pub struct LabelCount {
    pub label: String,
    pub count: usize,
}

BBoxStats

Bounding box quality metrics.
pub struct BBoxStats {
    pub total: usize,
    pub zero_area: usize,
    pub negative_area: usize,
    pub out_of_bounds: usize,
    pub degenerate: usize,
}

Example

use panlabel::stats::{stats_dataset, StatsOptions};
use panlabel::ir;

// Load a dataset
let dataset = ir::io_coco_json::read_coco_json("dataset.json")?;

// Compute statistics with default options
let report = stats_dataset(&dataset, &StatsOptions::default());

// Access summary data
println!("Total images: {}", report.summary.total_images);
println!("Total annotations: {}", report.summary.total_annotations);
println!("Zero-area bboxes: {}", report.bboxes.zero_area);

// Display as text
println!("{}", report);

// Serialize to JSON
let json = serde_json::to_string_pretty(&report)?;

Custom Options Example

use panlabel::stats::{stats_dataset, StatsOptions};

let opts = StatsOptions {
    top_labels: 20,        // Show top 20 labels
    top_pairs: 15,         // Show top 15 co-occurrence pairs
    oob_tolerance_px: 1.0, // Allow 1px out-of-bounds tolerance
    bar_width: 40,         // Wider histogram bars
};

let report = stats_dataset(&dataset, &opts);

HTML Report Generation

use panlabel::stats::{stats_dataset, StatsOptions, html};

let report = stats_dataset(&dataset, &StatsOptions::default());

// Generate self-contained HTML report
let html_output = html::render_html(&report)?;
std::fs::write("report.html", html_output)?;

Build docs developers (and LLMs) love