@dlt.resource

Overview

The @dlt.resource decorator transforms any generator (yielding) function into a dlt resource, or wraps data directly into a resource. A resource represents a location within a source that holds data with specific structure (schema) or coming from specific origin, such as a REST API endpoint, database table, or tab in Google Sheets. A dlt resource is a Python representation that combines both data and metadata (table schema) that describes the structure and instructs the loading of the data. A dlt resource is also an Iterable and can be used like any other iterable object (list, tuple, etc.).

Signature

@dlt.resource(
    name: str = None,
    table_name: str = None,
    max_table_nesting: int = None,
    write_disposition: str | dict = None,
    columns: dict | Type = None,
    primary_key: str | List[str] = None,
    merge_key: str | List[str] = None,
    schema_contract: dict = None,
    table_format: str = None,
    file_format: str = None,
    references: list = None,
    nested_hints: dict = None,
    selected: bool = True,
    spec: Type[BaseConfiguration] = None,
    parallelized: bool = False,
    incremental: Incremental = None,
    section: str = None,
    standalone: bool = None,
)

Parameters

data

Callable | Iterable

required

A function to be decorated or data compatible with dlt run. Can be a generator function, list, iterator, or any iterable.

name

str

default:"None"

A name of the resource that by default also becomes the name of the table to which the data is loaded. If not present, the name of the decorated function will be used.

table_name

str | Callable

default:"None"

A table name, if different from name. This argument also accepts a callable that is used to dynamically create tables for stream-like resources yielding many datatypes.

max_table_nesting

int

default:"None"

A schema hint that sets the maximum depth of nested table above which the remaining nodes are loaded as structs or JSON.

write_disposition

str | dict

default:"'append'"

Controls how to write data to a table. Accepts a shorthand string literal or configuration dictionary.Allowed shorthand string literals:

append: Always add new data at the end of the table
replace: Replace existing data with new data
skip: Prevent data from loading
merge: Deduplicate and merge data based on primary_key and merge_key hints

For advanced usage, use a configuration dictionary. For example, to obtain an SCD2 table:

write_disposition={"disposition": "merge", "strategy": "scd2"}

columns

dict | List | Type

default:"None"

A list, dict or pydantic model of column schemas. Typed dictionary describing column names, data types, write disposition and performance hints that gives you full control over the created table schema.When the argument is a pydantic model, the model will be used to validate the data yielded by the resource as well.

primary_key

str | List[str]

default:"None"

A column name or a list of column names that comprise a primary key. Typically used with “merge” write disposition to deduplicate loaded data.

merge_key

str | List[str]

default:"None"

A column name or a list of column names that define a merge key. Typically used with “merge” write disposition to remove overlapping data ranges (e.g., to keep a single record for a given day).

schema_contract

dict

default:"None"

Schema contract settings that will be applied to this resource.

table_format

str

default:"None"

Defines the storage format of the table. Currently only “iceberg” is supported on Athena, and “delta” on the filesystem. Other destinations ignore this hint.

file_format

str

default:"None"

Format of the file in which resource data is stored. Useful when importing external files. Use preferred to force a file format that is preferred by the destination used. This setting supersedes the load_file_format passed to pipeline run method.

references

list

default:"None"

A list of references to other table’s columns. Format:

[{
    'referenced_table': 'other_table',
    'columns': ['col1', 'col2'],
    'referenced_columns': ['other_col1', 'other_col2']
}]

Table and column names will be normalized according to the configured naming convention.

nested_hints

dict

default:"None"

Hints for nested tables created by this resource.

selected

bool

default:"True"

When True, dlt pipeline will extract and load this resource. If False, the resource will be ignored.

spec

Type[BaseConfiguration]

default:"None"

A specification of configuration and secret values required by the resource.

parallelized

bool

default:"False"

If True, the resource generator will be extracted in parallel with other resources. Transformers that return items are also parallelized.

incremental

Incremental

default:"None"

An incremental configuration for the resource to enable incremental loading.

section

str

default:"None"

Configuration section that comes right after ‘sources’ in default layout. If not present, the current python module name will be used.Default layout is sources.<section>.<name>.<key_name>. Note that resource section is used only when a single resource is passed to the pipeline.

standalone

bool

default:"None"

Deprecated. Past functionality got merged into regular resource.

Returns

resource

DltResource

A DltResource instance which may be loaded, iterated or combined with other resources into a pipeline.

Configuration Injection

When used as a decorator, the resource may automatically bind function arguments to secret and config values:

@dlt.resource
def user_games(username, chess_url: str = dlt.config.value, api_secret = dlt.secrets.value):
    return requests.get(
        f"{chess_url}/games/{username}",
        headers={"Authorization": f"Bearer {api_secret}"}
    )

list(user_games("magnuscarlsen"))

In this example:

username is a required, explicit python argument
chess_url is required and will be taken from config.toml if not explicitly passed
api_secret is required and will be taken from secrets.toml if not explicitly passed

Note: If the decorated function is an inner function, passing of credentials will be disabled.

Usage Examples

Basic Resource as Decorator

import dlt

@dlt.resource
def users():
    yield [{"id": 1, "name": "Alice"}]
    yield [{"id": 2, "name": "Bob"}]

pipeline = dlt.pipeline(destination="duckdb")
pipeline.run(users())

Resource from Data

import dlt

data = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
users_resource = dlt.resource(data, name="users")

pipeline = dlt.pipeline(destination="duckdb")
pipeline.run(users_resource)

Resource with Merge Write Disposition

@dlt.resource(
    write_disposition="merge",
    primary_key="id"
)
def products():
    yield [
        {"id": 1, "name": "Widget", "price": 10.99},
        {"id": 2, "name": "Gadget", "price": 24.99}
    ]

Resource with Incremental Loading

import dlt
from dlt.sources.helpers.incremental import incremental

@dlt.resource(
    primary_key="id",
    write_disposition="append"
)
def events(created_at=dlt.sources.incremental("created_at")):
    # Only fetch events after last created_at value
    for event in fetch_events(since=created_at.last_value):
        yield event

Resource with Column Schema

@dlt.resource(
    columns={
        "id": {"data_type": "bigint", "nullable": False},
        "email": {"data_type": "text", "unique": True},
        "created_at": {"data_type": "timestamp"}
    },
    primary_key="id"
)
def users():
    yield [{"id": 1, "email": "[email protected]", "created_at": "2024-01-01"}]

Resource with Pydantic Validation

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

@dlt.resource(columns=User)
def validated_users():
    # Data will be validated against User model
    yield [{"id": 1, "name": "Alice", "email": "[email protected]"}]

Parallelized Resource

@dlt.resource(parallelized=True)
def large_dataset():
    # This resource will be extracted in parallel with others
    for i in range(10000):
        yield {"id": i, "value": f"item_{i}"}

Dynamic Table Names

def table_name_func(item):
    return f"users_{item['country']}"

@dlt.resource(table_name=table_name_func)
def users_by_country():
    yield {"id": 1, "name": "Alice", "country": "US"}
    yield {"id": 2, "name": "Bob", "country": "UK"}
    # Creates tables: users_US, users_UK

Core API

Decorators

Classes

Configuration

Sources Module

Destinations Module

@dlt.resource

Overview

Signature

Parameters

Returns

Configuration Injection

Usage Examples

Basic Resource as Decorator

Resource from Data

Resource with Merge Write Disposition

Resource with Incremental Loading

Resource with Column Schema

Resource with Pydantic Validation

Parallelized Resource

Dynamic Table Names

See Also

Build docs developers (and LLMs) love

Core API

Decorators

Classes

Configuration

Sources Module

Destinations Module

​Overview

​Signature

​Parameters

​Returns

​Configuration Injection

​Usage Examples

​Basic Resource as Decorator

​Resource from Data

​Resource with Merge Write Disposition

​Resource with Incremental Loading

​Resource with Column Schema

​Resource with Pydantic Validation

​Parallelized Resource

​Dynamic Table Names

​See Also

Build docs developers (and LLMs) love

Overview

Signature

Parameters

Returns

Configuration Injection

Usage Examples

Basic Resource as Decorator

Resource from Data

Resource with Merge Write Disposition

Resource with Incremental Loading

Resource with Column Schema

Resource with Pydantic Validation

Parallelized Resource

Dynamic Table Names

See Also