Skip to main content
Pascal VOC (Visual Object Classes) is a classic XML-based format for object detection. Panlabel supports reading and writing VOC datasets.

Overview

  • Path type: Directory with Annotations/ and optional JPEGImages/
  • Lossiness: Lossy (see below)
  • Bbox format: Pixel-space XYXY [xmin, ymin, xmax, ymax]
  • Use case: Legacy datasets, academic benchmarks

Directory Structure

dataset/
├── Annotations/
│   ├── img1.xml
│   ├── img2.xml
│   └── train/
│       └── img3.xml
└── JPEGImages/
    ├── img1.jpg
    ├── img2.jpg
    └── train/
        └── img3.jpg

Key Components

  • Annotations/: XML files, one per image (required)
  • JPEGImages/: Image files (optional, not read by Panlabel)

XML Structure

<?xml version="1.0" encoding="utf-8"?>
<annotation>
  <folder>JPEGImages</folder>
  <filename>img1.jpg</filename>
  <size>
    <width>640</width>
    <height>480</height>
    <depth>3</depth>
  </size>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occluded>0</occluded>
    <bndbox>
      <xmin>100</xmin>
      <ymin>150</ymin>
      <xmax>300</xmax>
      <ymax>400</ymax>
    </bndbox>
  </object>
  <object>
    <name>car</name>
    <bndbox>
      <xmin>350</xmin>
      <ymin>200</ymin>
      <xmax>600</xmax>
      <ymax>450</ymax>
    </bndbox>
  </object>
</annotation>

Bounding Box Format

VOC uses pixel-space XYXY coordinates (same as IR):
<bndbox>
  <xmin>100</xmin>
  <ymin>150</ymin>
  <xmax>300</xmax>
  <ymax>400</ymax>
</bndbox>
  • xmin: Left edge in pixels
  • ymin: Top edge in pixels
  • xmax: Right edge in pixels
  • ymax: Bottom edge in pixels
No coordinate conversion needed (already XYXY).

Object Attributes

VOC supports several object-level attributes:
  • pose: Object pose (e.g., “Frontal”, “Left”, “Unspecified”)
  • truncated: 1 if object is cut off at image boundary, 0 otherwise
  • difficult: 1 if object is hard to recognize, 0 otherwise
  • occluded: 1 if object is occluded, 0 otherwise (non-standard but supported)
Panlabel stores these as annotation attributes.

Attribute Mapping

Reading:
<truncated>1</truncated>  →  IR attribute: {"truncated": "1"}
<difficult>0</difficult>  →  IR attribute: {"difficult": "0"}
<occluded>yes</occluded>  →  IR attribute: {"occluded": "yes"}
<pose>Left</pose>         →  IR attribute: {"pose": "Left"}
Writing:
  • Retrieves attributes from IR annotation
  • Normalizes boolean values:
    • true/yes/11
    • false/no/00
    • Other values → omitted

Reader Behavior

Input Path

Accepts:
  • Dataset root containing Annotations/
  • Annotations/ directory directly

Reading Process

  1. Discover layout (find Annotations/ directory)
  2. Scan Annotations/ flat only (non-recursive)
  3. Parse each XML file:
    • Extract <filename>, <width>, <height>
    • Extract <depth> (stored as image attribute)
    • Parse all <object> elements
  4. Assign deterministic IDs:
    • Image IDs: by <filename> (lexicographic)
    • Category IDs: by class name (lexicographic)
    • Annotation IDs: by XML file order, then <object> order

Coordinate Policy

Reads xmin/ymin/xmax/ymax exactly as provided (no 0/1-based adjustment).

Nested XML Warning

Nested XML files (e.g., Annotations/train/img.xml) are skipped with a warning:
Warning: VOC reader scans Annotations/ flat (non-recursive); skipping 2 nested .xml file(s), e.g. train/img3.xml

Writer Behavior

Output Structure

output/
├── Annotations/
│   ├── img1.xml
│   └── train/
│       └── img3.xml
└── JPEGImages/
    └── README.txt

Writing Process

  1. Create Annotations/ and JPEGImages/ directories
  2. Write JPEGImages/README.txt placeholder
  3. For each image:
    • Create XML file at Annotations/<stem>.xml
    • Preserve subdirectory structure from file_name
    • Write all annotations sorted by annotation ID
  4. Does not copy image binaries

Depth Attribute

Retrieves <depth> from image attribute "depth" if present:
<size>
  <width>640</width>
  <height>480</height>
  <depth>3</depth>
</size>

Boolean Normalization

Writes normalized boolean attributes:
true/yes/1  →  1
false/no/0  →  0
other       →  omitted

Empty Images

Writes XML files for images without annotations:
<?xml version="1.0" encoding="utf-8"?>
<annotation>
  <folder>JPEGImages</folder>
  <filename>img2.jpg</filename>
  <size>
    <width>800</width>
    <height>600</height>
  </size>
</annotation>

Lossiness

VOC format is lossy. Not preserved:
  • Dataset-level metadata/licenses
  • Image-level license/date metadata
  • Annotation confidence
  • Category supercategory
  • Custom attributes (except pose, truncated, difficult, occluded)
Preserved:
  • Image filenames and dimensions
  • Image depth (as attribute)
  • Category names
  • Bounding box coordinates (XYXY)
  • Standard object attributes (pose, truncated, difficult, occluded)
VOC does not store image binaries during conversion. You must manually copy images to the JPEGImages/ directory after writing.

Usage

Read VOC

panlabel convert dataset/ output.json --input-format voc --output-format ir-json
or from Annotations/ directly:
panlabel convert dataset/Annotations/ output.json --input-format voc --output-format ir-json

Write VOC

panlabel convert input.json voc-output/ --input-format ir-json --output-format voc
Then manually copy images:
cp -r original/JPEGImages/* voc-output/JPEGImages/

Subdirectory Structure

VOC preserves subdirectory structure in output: Input IR:
{"file_name": "train/img1.jpg", ...}
Output:
Annotations/train/img1.xml

See Also

YOLO Format

Another directory-based format

Format Overview

Compare all supported formats

Build docs developers (and LLMs) love