Skip to main content
Panlabel can automatically detect annotation formats, eliminating the need to specify --from explicitly. This makes commands more concise and reduces errors from format mismatches.

Using Auto-Detection

Use --from auto with the convert command:
panlabel convert --from auto --to yolo \
  --input dataset.json \
  --output ./yolo_dataset
Or simply omit --from (defaults to auto):
panlabel convert --to yolo \
  --input dataset.json \
  --output ./yolo_dataset
The stats, diff, and sample commands also support auto-detection:
# Auto-detect format
panlabel stats dataset.json

# Compare formats automatically
panlabel diff dataset_a.json ./yolo_dataset

# Sample with auto-detection
panlabel sample --input dataset.json --output sample.json --n 100
Auto-detection is enabled by default for stats, diff, and sample commands.

Detection Heuristics

Panlabel uses a multi-layered approach to detect formats:

File Type Detection

First, Panlabel examines the file extension:
ExtensionInitial GuessNext Step
.jsonJSON formatInspect content to distinguish COCO, Label Studio, or IR JSON
.xmlXML formatInspect content to distinguish CVAT or VOC
.csvTFOD formatDirectly recognized as TensorFlow Object Detection format
DirectoryDirectory-basedCheck for YOLO, VOC, or CVAT directory layouts

JSON Format Detection

For JSON files, Panlabel inspects the structure (from detect_json_format in lib.rs:987-1092):

Array-Root JSON

If the JSON root is an array:
[
  {
    "data": {"image": "https://example.com/image.jpg"},
    "annotations": [...]
  },
  ...
]
β†’ Detected as Label Studio task export format.
Label Studio uses an array of task objects, each with a data object containing an image field.

Object-Root JSON

If the JSON root is an object, Panlabel inspects the annotations[0].bbox structure: COCO Format: bbox is an array of 4 numbers [x, y, width, height]:
{
  "images": [...],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [100, 50, 200, 150]
    }
  ],
  "categories": [...]
}
β†’ Detected as COCO. IR JSON Format: bbox is an object with min/max or xmin/ymin/xmax/ymax fields:
{
  "images": [...],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": {
        "min": {"x": 100, "y": 50},
        "max": {"x": 300, "y": 200}
      }
    }
  ],
  "categories": [...]
}
β†’ Detected as IR JSON.
Detection fails if the annotations array is empty or the first annotation lacks a bbox field. In these cases, specify --from explicitly.

XML Format Detection

For XML files, Panlabel inspects the root element (from detect_xml_format in lib.rs:1094-1119):
Root ElementDetected FormatNotes
<annotations>CVAT XMLCVAT for images export
<annotation>ErrorSingle VOC file; use directory path instead
OtherErrorUnknown XML format

CVAT XML Example

<annotations>
  <version>1.1</version>
  <meta>...</meta>
  <image id="0" name="image1.jpg" width="800" height="600">
    <box label="person" xtl="100" ytl="50" xbr="300" ybr="200"/>
  </image>
</annotations>
β†’ Detected as CVAT.

VOC XML Single File

<annotation>
  <filename>image1.jpg</filename>
  <size>
    <width>800</width>
    <height>600</height>
  </size>
  <object>
    <name>person</name>
    <bndbox>
      <xmin>100</xmin>
      <ymin>50</ymin>
      <xmax>300</xmax>
      <ymax>200</ymax>
    </bndbox>
  </object>
</annotation>
β†’ Error: Single VOC XML files are not auto-detected. Use --from voc with the VOC dataset directory.
Pascal VOC uses a directory layout with multiple XML files. Auto-detection works for VOC directories, not individual XML files.

Directory Layout Detection

For directories, Panlabel checks for specific structures (from detect_dir_format in lib.rs:882-941):

YOLO Directory

Recognized if any of these conditions are met:
  1. Directory contains a labels/ subdirectory with .txt files
  2. Directory is named labels/ (case-insensitive) and contains .txt files
Example structure:
yolo_dataset/
β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ img1.jpg
β”‚   └── img2.jpg
└── labels/
    β”œβ”€β”€ img1.txt
    └── img2.txt
Or directly pointing to labels/:
labels/
β”œβ”€β”€ img1.txt
└── img2.txt
β†’ Detected as YOLO.

VOC Directory

Recognized if any of these conditions are met:
  1. Directory contains Annotations/ (with .xml files) and JPEGImages/ subdirectories
  2. Directory is named Annotations/ (case-insensitive), contains .xml files, and has sibling JPEGImages/ directory
Example structure:
VOC_dataset/
β”œβ”€β”€ Annotations/
β”‚   β”œβ”€β”€ img1.xml
β”‚   └── img2.xml
└── JPEGImages/
    β”œβ”€β”€ img1.jpg
    └── img2.jpg
Or pointing to Annotations/:
VOC_dataset/
β”œβ”€β”€ Annotations/  ← point here
β”‚   β”œβ”€β”€ img1.xml
β”‚   └── img2.xml
└── JPEGImages/
    β”œβ”€β”€ img1.jpg
    └── img2.jpg
β†’ Detected as VOC.

CVAT Directory

Recognized if:
  • Directory contains annotations.xml at root
Example structure:
cvat_export/
β”œβ”€β”€ annotations.xml
└── images/
    β”œβ”€β”€ img1.jpg
    └── img2.jpg
β†’ Detected as CVAT.

Ambiguous Cases

If a directory matches multiple formats, detection fails with a clear error:
panlabel convert --from auto --to coco --input ./ambiguous_dir --output output.json
Output:
Error: Format detection failed for './ambiguous_dir'
Reason: directory matches both YOLO and VOC layouts. Use --from to specify format explicitly.
Possible ambiguities:
  • YOLO + VOC: Directory has both labels/ with .txt files and Annotations/ with .xml files
  • YOLO + CVAT: Directory has labels/ with .txt files and annotations.xml
  • VOC + CVAT: Directory has Annotations/ with .xml files and annotations.xml
To resolve ambiguous cases, use --from to specify the format explicitly:
panlabel convert --from yolo --to coco \
  --input ./ambiguous_dir \
  --output output.json

When to Use Explicit Format

Empty Datasets

Auto-detection fails for empty datasets:
panlabel convert --from auto --to yolo --input empty.json --output ./yolo_out
Output:
Error: Format detection failed for 'empty.json'
Reason: empty 'annotations' array. Cannot determine format from empty dataset. Use --from to specify format explicitly.
Solution:
panlabel convert --from coco --to yolo \
  --input empty.json \
  --output ./yolo_out

Missing Bounding Boxes

If the first annotation lacks a bbox field:
panlabel convert --from auto --to yolo --input dataset.json --output ./yolo_out
Output:
Error: Format detection failed for 'dataset.json'
Reason: first annotation has no 'bbox' field. Cannot determine format.
Solution: Specify --from explicitly.

Ambiguous Directory Layouts

When a directory matches multiple formats:
panlabel convert --from yolo --to coco \
  --input ./my_dataset \
  --output output.json

Custom or Non-Standard Formats

If your dataset uses a non-standard structure or extension:
# Custom .data extension
panlabel convert --from coco --to yolo \
  --input dataset.data \
  --output ./yolo_out

Single VOC XML Files

VOC XML files are not auto-detected individually:
# βœ— Won't work - single VOC file
panlabel convert --from auto --to coco \
  --input image1.xml \
  --output output.json

# βœ“ Use directory path instead
panlabel convert --from voc --to coco \
  --input ./VOC_dataset \
  --output output.json

Debugging Detection Issues

If auto-detection fails unexpectedly:
  1. Check the error message - Panlabel provides detailed reasons for detection failures
  2. Verify file structure - Ensure your dataset follows the expected layout for your format
  3. Inspect sample data - Check that the first annotation has a bbox field (for JSON formats)
  4. Use explicit format - When in doubt, specify --from explicitly

Common Error Messages

Error MessageCauseSolution
unrecognized file extensionUnknown file typeUse --from to specify format
empty 'annotations' arrayNo annotations to inspectUse --from or add annotations
first annotation has no 'bbox' fieldInvalid annotation structureFix annotation or use --from
directory matches both YOLO and VOC layoutsAmbiguous directoryUse --from to specify format
missing or invalid 'annotations' arrayMalformed JSONFix JSON structure or use --from
array-root JSON not recognizedUnknown array-root formatCheck if it’s valid Label Studio format

Stats Command Fallback

The stats command has special fallback behavior for JSON files:
panlabel stats dataset.json
If detection fails for a .json file, stats falls back to reading as ir-json. This is convenient for IR JSON files that may have non-standard structures during development.
This fallback only applies to stats. The convert and validate commands will still fail if detection fails.

Best Practices

Use Auto-Detection for Interactive Work

Auto-detection makes exploratory data analysis faster:
# Quick stats without remembering format
panlabel stats dataset.json

# Compare different formats easily
panlabel diff dataset_v1.json ./yolo_dataset

Use Explicit Format in Scripts

For production scripts, specify formats explicitly for:
  • Predictability
  • Better error messages
  • Documentation clarity
#!/bin/bash
# Production conversion script
panlabel convert \
  --from coco \
  --to yolo \
  --input "$INPUT_FILE" \
  --output "$OUTPUT_DIR" \
  --allow-lossy

Document Expected Formats

In documentation and READMEs, always specify formats explicitly:
# Converting Our Dataset

Our annotations are in COCO format. Convert to YOLO:

```bash
panlabel convert --from coco --to yolo \
  --input annotations.json \
  --output ./yolo_dataset

### Test Detection with Sample Data

Before processing large datasets, test detection with a sample:

```bash
# Create small sample first
panlabel sample --input large_dataset.json --output sample.json --n 10

# Test auto-detection
panlabel stats sample.json

# If detection works, process full dataset
panlabel convert --from auto --to yolo \
  --input large_dataset.json \
  --output ./yolo_full

Build docs developers (and LLMs) love