Skip to main content
The DeleteFiles interface provides an API for removing data files from an Iceberg table.

Overview

DeleteFiles accumulates file deletions, produces a new snapshot of the table, and commits that snapshot as the current. This is used to remove files that are no longer needed or to delete data matching specific criteria.

Interface

public interface DeleteFiles extends SnapshotUpdate<DeleteFiles>

Core Methods

deleteFile() with Path

Deletes a file by its path.
DeleteFiles deleteFile(CharSequence path)
Parameters:
  • path - A fully-qualified file path to remove from the table
Returns: This for method chaining Description: To remove a file from the table, this path must exactly match a path in the table’s metadata. Paths that are different but equivalent will not be removed. For example, file:/path/file.avro is equivalent to file:///path/file.avro, but would not remove the latter. Example:
table.newDelete()
    .deleteFile("/data/date=2024-01-15/data-001.parquet")
    .commit();

deleteFile() with DataFile

Deletes a file tracked by a DataFile.
default DeleteFiles deleteFile(DataFile file)
Parameters:
  • file - A DataFile to remove from the table
Returns: This for method chaining Example:
DataFile fileToDelete = ...; // From scan or manifest

table.newDelete()
    .deleteFile(fileToDelete)
    .commit();

deleteFromRowFilter()

Deletes files that match an expression on data rows.
DeleteFiles deleteFromRowFilter(Expression expr)
Parameters:
  • expr - An expression on rows in the table
Returns: This for method chaining Throws: ValidationException if a file can contain both rows that match and rows that do not Description: A file is selected to be deleted if it could contain any rows that match the expression (using an inclusive projection). Files are deleted if all rows in the file must match the expression (using a strict projection). Example:
import org.apache.iceberg.expressions.Expressions;

// Delete all files for a specific partition
table.newDelete()
    .deleteFromRowFilter(Expressions.equal("date", "2024-01-15"))
    .commit();

caseSensitive()

Enables or disables case sensitive expression binding.
DeleteFiles caseSensitive(boolean caseSensitive)
Parameters:
  • caseSensitive - Whether expression binding should be case sensitive
Returns: This for method chaining Example:
table.newDelete()
    .caseSensitive(false)
    .deleteFromRowFilter(Expressions.equal("DATE", "2024-01-15"))
    .commit();

validateFilesExist()

Enables validation that deleted files still exist when committing.
default DeleteFiles validateFilesExist()
Returns: This for method chaining Description: Validates that any files being deleted are still part of the table when the operation commits. This prevents issues from concurrent modifications. Example:
table.newDelete()
    .deleteFile(fileToDelete)
    .validateFilesExist()
    .commit();

Examples

Delete Single File

import org.apache.iceberg.Table;
import org.apache.iceberg.DeleteFiles;

// Delete by path
String filePath = "/data/date=2024-01-15/old-file.parquet";

table.newDelete()
    .deleteFile(filePath)
    .commit();

System.out.println("Deleted file: " + filePath);

Delete Multiple Files

import java.util.List;

// Delete multiple files
List<String> filesToDelete = Arrays.asList(
    "/data/file-001.parquet",
    "/data/file-002.parquet",
    "/data/file-003.parquet"
);

DeleteFiles delete = table.newDelete();
for (String path : filesToDelete) {
    delete.deleteFile(path);
}
delete.commit();

System.out.println("Deleted " + filesToDelete.size() + " files");

Delete Files from Scan

import org.apache.iceberg.TableScan;
import org.apache.iceberg.FileScanTask;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.expressions.Expressions;

// Find and delete old files
TableScan scan = table.newScan()
    .filter(Expressions.lessThan("timestamp", oldTimestamp));

DeleteFiles delete = table.newDelete();

try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
    for (FileScanTask task : tasks) {
        delete.deleteFile(task.file());
    }
}

delete.commit();

Delete Partition

import org.apache.iceberg.expressions.Expressions;

// Delete entire partition
String targetDate = "2024-01-15";

table.newDelete()
    .deleteFromRowFilter(Expressions.equal("date", targetDate))
    .commit();

System.out.println("Deleted partition: date=" + targetDate);

Delete Multiple Partitions

import org.apache.iceberg.expressions.Expression;

// Delete multiple partitions
List<String> datesToDelete = Arrays.asList(
    "2024-01-01",
    "2024-01-02",
    "2024-01-03"
);

// Build OR expression
Expression filter = null;
for (String date : datesToDelete) {
    Expression dateExpr = Expressions.equal("date", date);
    filter = (filter == null) ? dateExpr : Expressions.or(filter, dateExpr);
}

table.newDelete()
    .deleteFromRowFilter(filter)
    .commit();

System.out.println("Deleted " + datesToDelete.size() + " partitions");

Delete with Validation

import org.apache.iceberg.exceptions.ValidationException;

// Delete with validation that files exist
DataFile fileToDelete = findFileToDelete();

try {
    table.newDelete()
        .deleteFile(fileToDelete)
        .validateFilesExist()
        .commit();
    
    System.out.println("Successfully deleted file");
    
} catch (ValidationException e) {
    System.err.println("File no longer exists: " + e.getMessage());
    // File was already deleted by another process
}

Conditional Delete

// Delete files matching complex criteria
Expression filter = Expressions.and(
    Expressions.equal("category", "archived"),
    Expressions.lessThan("modified_time", cutoffTime)
);

table.newDelete()
    .deleteFromRowFilter(filter)
    .commit();

Delete Small Files

import org.apache.iceberg.DataFile;

// Delete files smaller than threshold
long minFileSize = 10 * 1024 * 1024; // 10 MB

TableScan scan = table.newScan();
DeleteFiles delete = table.newDelete();
int deleteCount = 0;

try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
    for (FileScanTask task : tasks) {
        DataFile file = task.file();
        if (file.fileSizeInBytes() < minFileSize) {
            delete.deleteFile(file);
            deleteCount++;
        }
    }
}

if (deleteCount > 0) {
    delete.commit();
    System.out.println("Deleted " + deleteCount + " small files");
}

Delete with Time-Based Filter

import java.time.Instant;
import java.time.temporal.ChronoUnit;

// Delete data older than 90 days
Instant cutoff = Instant.now().minus(90, ChronoUnit.DAYS);
long cutoffMillis = cutoff.toEpochMilli();

table.newDelete()
    .deleteFromRowFilter(
        Expressions.lessThan("timestamp", cutoffMillis)
    )
    .commit();

System.out.println("Deleted data older than 90 days");

Incremental Delete Pattern

class IncrementalDeleter {
    private final Table table;
    private final DeleteFiles delete;
    private int deleteCount = 0;
    
    public IncrementalDeleter(Table table) {
        this.table = table;
        this.delete = table.newDelete();
    }
    
    public void deleteFile(DataFile file) {
        delete.deleteFile(file);
        deleteCount++;
    }
    
    public void deleteFile(String path) {
        delete.deleteFile(path);
        deleteCount++;
    }
    
    public void commit() {
        if (deleteCount > 0) {
            delete.commit();
            System.out.println("Deleted " + deleteCount + " files");
        }
    }
}

// Usage
IncrementalDeleter deleter = new IncrementalDeleter(table);
try {
    for (DataFile file : filesToDelete) {
        if (shouldDelete(file)) {
            deleter.deleteFile(file);
        }
    }
} finally {
    deleter.commit();
}

Case-Insensitive Delete

// Delete using case-insensitive column names
table.newDelete()
    .caseSensitive(false)
    .deleteFromRowFilter(
        Expressions.equal("STATUS", "deleted")
    )
    .commit();

Safe Delete with Error Handling

import org.apache.iceberg.exceptions.CommitFailedException;
import org.apache.iceberg.exceptions.ValidationException;

public void safeDelete(Table table, List<DataFile> files) {
    try {
        DeleteFiles delete = table.newDelete()
            .validateFilesExist();
        
        for (DataFile file : files) {
            delete.deleteFile(file);
        }
        
        delete.commit();
        
        System.out.println("Successfully deleted " + files.size() + " files");
        
    } catch (ValidationException e) {
        System.err.println("Validation failed: " + e.getMessage());
        // Some files don't exist
        
    } catch (CommitFailedException e) {
        System.err.println("Commit failed: " + e.getMessage());
        // Retry or handle error
    }
}

Important Notes

Path Matching

Paths must match exactly:
// These are equivalent but won't match each other:
// - "file:/path/file.avro"
// - "file:///path/file.avro"

// Always use the exact path from table metadata
DataFile file = ...;
table.newDelete()
    .deleteFile(file.path()) // Use exact path
    .commit();

Atomic Operations

All deletions are atomic:
// Either all files are deleted or none are
DeleteFiles delete = table.newDelete();
for (DataFile file : files) {
    delete.deleteFile(file);
}
delete.commit(); // Atomic

Expression-Based Deletion

Files are only deleted if all rows match:
// Only deletes files where ALL rows match the expression
table.newDelete()
    .deleteFromRowFilter(Expressions.equal("status", "archived"))
    .commit();

// Files with mixed status values will cause ValidationException

See Also

Build docs developers (and LLMs) love