Skip to main content

DataFile

DataFile represents a physical data file tracked by an Iceberg table. It carries the file path, format, partition data, record counts, file size, and column-level metrics used during planning.

What it contains

  • File path and format
  • Partition values for the file
  • Record count and file size
  • Column metrics such as null counts and bounds

Common usage

You most often work with DataFile when you append new files, inspect scan tasks, or rewrite existing files as part of maintenance operations.
DataFile dataFile = DataFiles.builder(table.spec())
    .withPath("s3://warehouse/db/table/data.parquet")
    .withFileSizeInBytes(10_485_760L)
    .withRecordCount(100_000L)
    .build();

Build docs developers (and LLMs) love