stats command analyzes annotation datasets and generates comprehensive statistics reports including category distribution, annotation counts, dimension analysis, and more.
Usage
Parameters
Path to the dataset to analyze. Can be a file or directory depending on the format.
Input format. If omitted, Panlabel auto-detects the format.Supported values:
ir-json, coco, cvat, label-studio, tfod, yolo, vocAliases: coco-json, cvat-xml, label-studio-json, ls, tfod-csv, ultralytics, yolov8, yolov5, pascal-voc, voc-xmlWhen auto-detection fails for a JSON file, stats falls back to reading it as
ir-json.Number of top labels and label pairs to show in the report.Useful for large datasets with many categories.
Tolerance in pixels for out-of-bounds checks.Annotations within this tolerance of the image boundary are not flagged as out-of-bounds.
Output format for the statistics report.Options:
text- Human-readable text report with ASCII visualizationsjson- Machine-readable JSON with full statisticshtml- Self-contained HTML report with interactive charts
Statistics Included
The stats report includes:Dataset Overview
- Total images, annotations, and categories
- Images with/without annotations
- Average annotations per image
Category Distribution
- Annotation count per category
- Visual bar charts (text mode) or interactive charts (HTML mode)
- Top N most frequent categories (controlled by
--top)
Dimension Analysis
- Image dimension distribution
- Bounding box size statistics (min, max, average)
- Aspect ratio analysis
Quality Metrics
- Out-of-bounds annotations (beyond
--tolerance) - Empty or zero-area bounding boxes
- Images with duplicate annotations
Co-occurrence Analysis
- Top N category pairs that appear together (controlled by
--top) - Useful for understanding object relationships in your dataset
Examples
Basic Statistics
Explicit Format with JSON Output
HTML Report
Show Top 20 Categories
YOLO Dataset Statistics
Custom Tolerance for OOB Checks
Output Examples
Text Report
JSON Report Structure
HTML Report
The HTML output creates a self-contained report with:- Interactive bar charts for category distribution
- Searchable/sortable tables
- Collapsible sections
- Responsive design for mobile viewing
- No external dependencies (all CSS/JS embedded)
Use Cases
Dataset Quality Assessment
Pre-Training Analysis
Automated Monitoring
Compare Dataset Versions
Performance Notes
- Stats computation is fast even for large datasets (millions of annotations)
- JSON output is more verbose but easier to parse programmatically
- HTML generation adds minimal overhead and produces self-contained files
- Use
--topto limit output size for datasets with hundreds of categories
See Also
Validate Command
Validate dataset quality
Diff Command
Compare two datasets
Sample Command
Create balanced subsets based on stats