Filesystem

The filesystem destination allows you to load data into local or cloud file systems including AWS S3, Google Cloud Storage, Azure Blob Storage, and more. It supports multiple file formats and table formats like Delta Lake and Apache Iceberg.

Install dlt with Filesystem

To use filesystem as a destination, install dlt with the filesystem extra:

pip install "dlt[filesystem]"

Quick Start

Here’s a simple example to get you started:

import dlt

# Define your data source
@dlt.resource
def my_data():
    yield {"id": 1, "name": "Alice"}
    yield {"id": 2, "name": "Bob"}

# Create pipeline - saves to local filesystem
pipeline = dlt.pipeline(
    pipeline_name="my_pipeline",
    destination="filesystem",
    dataset_name="my_dataset"
)

# Run the pipeline
info = pipeline.run(my_data())
print(info)

Configuration

Basic Configuration

bucket_url

string

required

The filesystem path or bucket URL. Supports various protocols:

Local: file:///path/to/directory
AWS S3: s3://bucket-name/path
GCS: gs://bucket-name/path
Azure: az://container-name/path
Memory: memory://m

layout

string

Layout of files in the destination using placeholders like {table_name}, {load_id}, {file_id}, etc.

extra_placeholders

dict

Additional custom placeholders for the layout

current_datetime

callable

Function to provide current datetime for time-based placeholders

max_state_files

int

default:"100"

Maximum number of pipeline state files to keep. Set to 0 or negative to disable cleanup.

Supported Protocols

Local Filesystem

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem("file:///tmp/my_data"),
    dataset_name="my_dataset"
)

AWS S3

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem("s3://my-bucket/data"),
    dataset_name="my_dataset"
)

Configure AWS credentials:

[destination.filesystem]
bucket_url = "s3://my-bucket/data"

[destination.filesystem.credentials]
aws_access_key_id = "your-access-key"
aws_secret_access_key = "your-secret-key"
region_name = "us-east-1"

Google Cloud Storage

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem("gs://my-bucket/data"),
    dataset_name="my_dataset"
)

Configure GCS credentials:

[destination.filesystem]
bucket_url = "gs://my-bucket/data"

[destination.filesystem.credentials]
project_id = "my-project"
private_key = "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
client_email = "[email protected]"

Azure Blob Storage

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem("az://my-container/data"),
    dataset_name="my_dataset"
)

Configure Azure credentials:

[destination.filesystem]
bucket_url = "az://my-container/data"

[destination.filesystem.credentials]
azure_storage_account_name = "mystorageaccount"
azure_storage_account_key = "your-account-key"

Hugging Face Datasets

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem("hf://datasets/my-username/my-dataset"),
    dataset_name="my_dataset"
)

Configure Hugging Face credentials:

[destination.filesystem]
bucket_url = "hf://datasets/my-username/my-dataset"

[destination.filesystem.credentials]
token = "hf_..."

File Formats

The filesystem destination supports multiple file formats:

pipeline = dlt.pipeline(
    destination="filesystem",
    dataset_name="my_dataset"
)
pipeline.run(my_data(), loader_file_format="jsonl")

File Layout

Customize the directory structure and file naming:

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem(
        bucket_url="s3://my-bucket/data",
        layout="{table_name}/{load_id}.{file_id}.{ext}"
    ),
    dataset_name="my_dataset"
)

Available Placeholders

{table_name}: Name of the table
{schema_name}: Name of the schema
{load_id}: Unique load identifier
{file_id}: File number within the load
{ext}: File extension (jsonl, parquet, csv)
{timestamp}: Load timestamp
{curr_date}: Current date

Custom Placeholders

Define custom placeholders:

import dlt
from dlt.destinations import filesystem
from datetime import datetime

pipeline = dlt.pipeline(
    destination=filesystem(
        bucket_url="s3://my-bucket/data",
        layout="year={year}/month={month}/{table_name}.{ext}",
        extra_placeholders={
            "year": datetime.now().year,
            "month": datetime.now().month
        }
    ),
    dataset_name="my_dataset"
)

Table Formats

Delta Lake

Create Delta Lake tables:

import dlt

@dlt.resource(table_format="delta")
def delta_data():
    yield {"id": 1, "value": "data"}

pipeline = dlt.pipeline(
    destination="filesystem",
    dataset_name="my_dataset"
)

pipeline.run(delta_data(), loader_file_format="parquet")

Delta Lake requires Parquet file format.

Apache Iceberg

Create Apache Iceberg tables:

import dlt

@dlt.resource(table_format="iceberg")
def iceberg_data():
    yield {"id": 1, "value": "data"}

pipeline = dlt.pipeline(
    destination="filesystem",
    dataset_name="my_dataset"
)

pipeline.run(iceberg_data(), loader_file_format="parquet")

Iceberg requires Parquet file format.

Write Dispositions

Regular Tables

For regular file-based tables, only truncate-and-insert is supported:

@dlt.resource(write_disposition="replace")
def replace_data():
    yield {"id": 1, "value": "new"}

Delta/Iceberg Tables

Delta and Iceberg tables support all write dispositions:

@dlt.resource(
    write_disposition="append",
    table_format="delta"
)
def append_data():
    yield {"id": 1, "value": "new"}

Querying Data

Query filesystem data using DuckDB:

import dlt
import duckdb

pipeline = dlt.pipeline(
    destination="filesystem",
    dataset_name="my_dataset"
)

pipeline.run(my_data())

# Query Parquet files
conn = duckdb.connect()
df = conn.execute("""
    SELECT * FROM read_parquet('file:///tmp/my_data/**/*.parquet')
""").df()

print(df.head())

Advanced Features

Partitioning

Partition data using the layout:

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem(
        bucket_url="s3://my-bucket/data",
        layout="{table_name}/year={year}/month={month}/data.{file_id}.{ext}"
    ),
    dataset_name="my_dataset"
)

Compression

Parquet files are automatically compressed. Configure compression level:

import dlt

pipeline = dlt.pipeline(
    destination="filesystem",
    dataset_name="my_dataset"
)

pipeline.run(my_data(), loader_file_format="parquet")

State Management

Control pipeline state file retention:

import dlt
from dlt.destinations import filesystem

pipeline = dlt.pipeline(
    destination=filesystem(
        bucket_url="s3://my-bucket/data",
        max_state_files=50  # Keep last 50 state files
    ),
    dataset_name="my_dataset"
)

Use Cases

Filesystem destination is ideal for:

Data Lake Storage: Store raw data in cloud storage
Staging Area: Stage data before loading to warehouses
Backup and Archive: Long-term data retention
Data Sharing: Share data files across systems
Cost Optimization: Lower storage costs compared to warehouses
Hugging Face Datasets: Publish datasets to Hugging Face

Performance Tips

Use Parquet: Better compression and query performance
Partition Data: Use date-based partitioning for large datasets
Table Formats: Use Delta/Iceberg for ACID guarantees
Batch Size: Adjust batch size for optimal file sizes
Compression: Leverage Parquet’s built-in compression

Limitations

Regular file tables don’t support merge operations
Delta/Iceberg require Parquet format
No built-in query engine (use DuckDB, Spark, etc.)
File-level operations only (no row-level updates for regular files)

Additional Resources

Delta Lake

Learn about Delta Lake format

Apache Iceberg

Learn about Iceberg tables

Getting Started

Core Concepts

Building Pipelines

Sources

Destinations

Advanced Usage

Filesystem

Filesystem

Install dlt with Filesystem

Quick Start

Configuration

Basic Configuration

Supported Protocols

Local Filesystem

AWS S3

Google Cloud Storage

Azure Blob Storage

Hugging Face Datasets

File Formats

File Layout

Available Placeholders

Custom Placeholders

Table Formats

Delta Lake

Apache Iceberg

Write Dispositions

Regular Tables

Delta/Iceberg Tables

Querying Data

Advanced Features

Partitioning

Compression

State Management

Use Cases

Performance Tips

Limitations

Additional Resources

Delta Lake

Apache Iceberg

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Building Pipelines

Sources

Destinations

Advanced Usage

​Filesystem

​Install dlt with Filesystem

​Quick Start

​Configuration

​Basic Configuration

​Supported Protocols

​Local Filesystem

​AWS S3

​Google Cloud Storage

​Azure Blob Storage

​Hugging Face Datasets

​File Formats

​File Layout

​Available Placeholders

​Custom Placeholders

​Table Formats

​Delta Lake

​Apache Iceberg

​Write Dispositions

​Regular Tables

​Delta/Iceberg Tables

​Querying Data

​Advanced Features

​Partitioning

​Compression

​State Management

​Use Cases

​Performance Tips

​Limitations

​Additional Resources

Delta Lake

Apache Iceberg

Build docs developers (and LLMs) love

Filesystem

Install dlt with Filesystem

Quick Start

Configuration

Basic Configuration

Supported Protocols

Local Filesystem

AWS S3

Google Cloud Storage

Azure Blob Storage

Hugging Face Datasets

File Formats

File Layout

Available Placeholders

Custom Placeholders

Table Formats

Delta Lake

Apache Iceberg

Write Dispositions

Regular Tables

Delta/Iceberg Tables

Querying Data

Advanced Features

Partitioning

Compression

State Management

Use Cases

Performance Tips

Limitations

Additional Resources