Skip to main content

Intermediate Representation

The Intermediate Representation (IR) is Panlabel’s canonical, format-agnostic representation of object detection datasets. All format conversions flow through the IR, similar to how Pandoc uses an internal AST for document conversion.

Design Principles

The IR follows three key principles:
  1. Type Safety: Uses newtypes and marker types to prevent common errors at compile time (e.g., mixing pixel and normalized coordinates)
  2. Canonical Format: Uses a single, well-defined coordinate system (XYXY in pixel space) to avoid ambiguity
  3. Permissive Construction: Allows “invalid” data to be represented (e.g., negative coordinates), so validation can report issues rather than panic during parsing

Core Types

Dataset

The top-level structure representing a complete annotation dataset:
pub struct Dataset {
    pub info: DatasetInfo,
    pub licenses: Vec<License>,
    pub images: Vec<Image>,
    pub categories: Vec<Category>,
    pub annotations: Vec<Annotation>,
}
Example:
use panlabel::ir::{Dataset, Image, Category, Annotation, BBoxXYXY, Pixel};

let dataset = Dataset {
    images: vec![Image::new(1u64, "image.jpg", 640, 480)],
    categories: vec![Category::new(1u64, "person")],
    annotations: vec![
        Annotation::new(
            1u64,
            1u64,
            1u64,
            BBoxXYXY::<Pixel>::from_xyxy(10.0, 20.0, 100.0, 200.0),
        )
    ],
    ..Default::default()
};

DatasetInfo

Metadata about the dataset:
pub struct DatasetInfo {
    pub name: Option<String>,
    pub version: Option<String>,
    pub description: Option<String>,
    pub url: Option<String>,
    pub year: Option<u32>,
    pub contributor: Option<String>,
    pub date_created: Option<String>,
}

Image

Represents an image in the dataset:
pub struct Image {
    pub id: ImageId,
    pub file_name: String,
    pub width: u32,
    pub height: u32,
    pub license_id: Option<LicenseId>,
    pub date_captured: Option<String>,
    pub attributes: BTreeMap<String, String>,
}
Builder pattern:
let image = Image::new(1u64, "photo.jpg", 1920, 1080)
    .with_license(1u64)
    .with_date_captured("2024-01-15");

Category

Represents a class label:
pub struct Category {
    pub id: CategoryId,
    pub name: String,
    pub supercategory: Option<String>,
}
Example:
let category = Category::new(1u64, "car");
let hierarchical = Category::with_supercategory(2u64, "sedan", "car");

Annotation

Represents a bounding box annotation:
pub struct Annotation {
    pub id: AnnotationId,
    pub image_id: ImageId,
    pub category_id: CategoryId,
    pub bbox: BBoxXYXY<Pixel>,
    pub confidence: Option<f64>,
    pub attributes: BTreeMap<String, String>,
}
Builder pattern:
let annotation = Annotation::new(
        1u64,
        1u64,
        1u64,
        BBoxXYXY::from_xyxy(10.0, 20.0, 100.0, 200.0),
    )
    .with_confidence(0.95)
    .with_attribute("occluded", "false")
    .with_attribute("truncated", "true");

License

Represents a license that can be associated with images:
pub struct License {
    pub id: LicenseId,
    pub name: String,
    pub url: Option<String>,
}
Example:
let license = License::new(1u64, "CC BY 4.0");
let with_url = License::with_url(
    2u64,
    "MIT",
    "https://opensource.org/licenses/MIT"
);

Bounding Box Types

BBoxXYXY

The canonical bounding box representation using XYXY format (xmin, ymin, xmax, ymax):
pub struct BBoxXYXY<TSpace> {
    pub min: Coord<TSpace>,
    pub max: Coord<TSpace>,
}
The TSpace parameter enforces coordinate space safety:
  • BBoxXYXY<Pixel>: Absolute pixel coordinates
  • BBoxXYXY<Normalized>: Normalized coordinates (0.0-1.0)
Construction:
use panlabel::ir::{BBoxXYXY, Pixel};

// Direct XYXY construction
let bbox = BBoxXYXY::<Pixel>::from_xyxy(10.0, 20.0, 100.0, 80.0);

// From XYWH (COCO-style: x, y, width, height)
let bbox = BBoxXYXY::<Pixel>::from_xywh(10.0, 20.0, 90.0, 60.0);

// From center-based format (YOLO-style: cx, cy, w, h)
let bbox = BBoxXYXY::<Pixel>::from_cxcywh(55.0, 50.0, 90.0, 60.0);
Operations:
let bbox = BBoxXYXY::<Pixel>::from_xyxy(10.0, 20.0, 100.0, 80.0);

// Access coordinates
assert_eq!(bbox.xmin(), 10.0);
assert_eq!(bbox.ymin(), 20.0);
assert_eq!(bbox.xmax(), 100.0);
assert_eq!(bbox.ymax(), 80.0);

// Calculate dimensions
assert_eq!(bbox.width(), 90.0);
assert_eq!(bbox.height(), 60.0);
assert_eq!(bbox.area(), 5400.0);

// Convert to other formats
let (x, y, w, h) = bbox.to_xywh();
let (cx, cy, w, h) = bbox.to_cxcywh();

// Validation
assert!(bbox.is_finite());   // All coords are finite (not NaN/infinite)
assert!(bbox.is_ordered());  // min <= max for both axes

// Compute IoU
let other = BBoxXYXY::<Pixel>::from_xyxy(5.0, 5.0, 15.0, 15.0);
let iou = bbox.iou(&other);
Coordinate conversion:
use panlabel::ir::{BBoxXYXY, Pixel, Normalized};

let pixel_bbox = BBoxXYXY::<Pixel>::from_xyxy(0.0, 0.0, 640.0, 480.0);
let normalized = pixel_bbox.to_normalized(1920.0, 1080.0);

let back_to_pixel = normalized.to_pixel(1920.0, 1080.0);

Coordinate System

Panlabel uses pixel-based XYXY coordinates as the canonical representation:
  • Origin: Top-left corner (0, 0)
  • X-axis: Increases to the right
  • Y-axis: Increases downward
  • Format: (xmin, ymin, xmax, ymax) in absolute pixel coordinates
  • Type: f64 for all coordinates (allows sub-pixel precision)
Example:
(0,0) ──────────► X

  │    (10, 20)
  │       ┌──────────┐
  │       │          │
  │       │  Object  │
  │       │          │
  │       └──────────┘
  │              (100, 80)

  Y
The box from (10, 20) to (100, 80) has:
  • Width: 90 pixels
  • Height: 60 pixels
  • Area: 5,400 square pixels

Type Safety

Panlabel uses newtype wrappers to prevent ID confusion:
pub struct ImageId(u64);
pub struct CategoryId(u64);
pub struct AnnotationId(u64);
pub struct LicenseId(u64);
This prevents bugs like passing an ImageId where a CategoryId is expected:
// This won't compile:
let image_id = ImageId::new(1);
let category_id: CategoryId = image_id; // ❌ Type error!

// IDs can be created from u64:
let image_id = ImageId::new(1);
let image_id: ImageId = 1u64.into();

Complete Example

use panlabel::ir::*;
use std::collections::BTreeMap;

fn create_sample_dataset() -> Dataset {
    Dataset {
        info: DatasetInfo {
            name: Some("My Dataset".to_string()),
            version: Some("1.0".to_string()),
            description: Some("A sample dataset for object detection".to_string()),
            ..Default::default()
        },
        licenses: vec![
            License::with_url(
                1u64,
                "CC BY 4.0",
                "https://creativecommons.org/licenses/by/4.0/"
            ),
        ],
        images: vec![
            Image::new(1u64, "image001.jpg", 1920, 1080)
                .with_license(1u64)
                .with_date_captured("2024-03-01"),
        ],
        categories: vec![
            Category::new(1u64, "person"),
            Category::with_supercategory(2u64, "car", "vehicle"),
        ],
        annotations: vec![
            Annotation::new(
                1u64,
                1u64,
                1u64,
                BBoxXYXY::<Pixel>::from_xyxy(100.0, 150.0, 300.0, 450.0),
            )
            .with_confidence(0.98)
            .with_attribute("occluded", "false"),
        ],
    }
}

Next Steps

Build docs developers (and LLMs) love