Function Signature
Description
Loads data from thedata argument into the destination specified in destination and dataset specified in dataset_name. This is a convenience function that creates or retrieves the current pipeline and runs it.
The method extracts data from the data argument, infers the schema, normalizes the data into a load package (i.e., jsonl or PARQUET files representing tables), and then loads such packages into the destination.
Execution Flow
- The
runmethod first usessync_destinationto synchronize pipeline state and schemas with the destination (can be disabled withrestore_from_destinationconfiguration option) - Ensures data from previous runs is fully processed. If not, normalizes and loads pending data items
- Extracts, normalizes, and loads new data from the
dataargument
Parameters
Data to be loaded to the destination. Can be supplied in several forms:
- A
listorIterableof any JSON-serializable objects:dlt.run([1, 2, 3], table_name="numbers") - Any
Iteratoror a function that yields (Generator):dlt.run(range(1, 10), table_name="range") - A function or list of functions decorated with
@dlt.resource:dlt.run([chess_players(title="GM"), chess_games()]) - A function or list of functions decorated with
@dlt.source
dlt handles bytes, datetime, decimal, and uuid objects, so you can load binary data or documents containing dates.A name of the destination to which dlt will load the data, or a destination module imported from
dlt.destination. If not provided, the value passed to dlt.pipeline will be used.A name of the destination where dlt will stage the data before final loading, or a destination module imported from
dlt.destination.A name of the dataset to which the data will be loaded. A dataset is a logical group of tables (e.g.,
schema in relational databases or folder grouping many files). If not provided, the value passed to dlt.pipeline will be used. If not provided at all, defaults to the pipeline_name.The name of the table to which the data should be loaded within the dataset. This argument is required for data that is a list/Iterable or Iterator without a
__name__ attribute.The behavior depends on the type of data:- Generator functions: The function name is used as table name;
table_nameoverrides this default - @dlt.resource: Resource contains the full table schema including the table name;
table_namewill override this property (use with care!) - @dlt.source: Source contains several resources each with a table schema;
table_namewill override all table names within the source and load data into a single table
Controls how to write data to a table. Accepts a shorthand string literal or configuration dictionary.Allowed shorthand string literals:
append: Always add new data at the end of the table (default)replace: Replace existing data with new dataskip: Prevent data from loadingmerge: Deduplicate and merge data based on “primary_key” and “merge_key” hints
write_disposition={"disposition": "merge", "strategy": "scd2"}Note: For dlt.resource, the table schema value will be overwritten. For dlt.source, values in all resources will be overwritten.A list of column schemas. Typed dictionary describing column names, data types, write disposition, and performance hints that gives you full control over the created table schema.
An explicit
Schema object in which all table schemas will be grouped. By default, dlt takes the schema from the source (if passed in data argument) or creates a default one itself.The file format the loader will use to create the load package. Not all file formats are compatible with all destinations. Defaults to the preferred file format of the selected destination.Common formats:
jsonl, parquet, insert_values, csvThe table format used by the destination to store tables. Can be
delta, iceberg, hive, or native. Currently you can select table format on filesystem and Athena destinations.An override for the schema contract settings. This will replace the schema contract settings for all tables in the schema. Controls schema evolution behavior such as allowing new tables, new columns, or data type changes.
Fully or partially reset sources before loading new data in this run.The following refresh modes are supported:
drop_sources: Drop tables and source and resource state for all sources currently being processed (Note: schema history is erased)drop_resources: Drop tables and resource state for all resources being processed. Source level state is not modified (Note: schema history is erased)drop_data: Wipe all data and resource state for all resources being processed. Schema is not modified
Returns
Information on loaded data including:
- List of package ids (
loads_ids) - Failed job statuses (
has_failed_jobs,load_packages) - Destination and dataset information
- Timing and metrics
dlt will not raise an exception if a single job terminally fails. Such information is provided via LoadInfo.Raises
Raised when a problem occurs during
extract, normalize, or load steps.Examples
Load a simple list
Load from a generator function
Load with merge write disposition
Load a dlt resource
Load with refresh to drop existing data
Load to Parquet files with Iceberg format
LoadInfo Methods
The returnedLoadInfo object provides several useful methods:
Related Documentation
See Also
- pipeline - Create a pipeline instance for more control
- Pipeline.run - The Pipeline instance method version