Function Signature
Description
Creates a new instance ofdlt pipeline, which moves data from a source (e.g., REST API) to a destination (e.g., database or data lake). When called without arguments, returns the most recently created pipeline instance.
The pipeline function allows you to configure the destination, dataset name, and various options that govern data loading. The created Pipeline object lets you load data from any source with the run method or use more granular control with extract, normalize, and load methods.
Parameters
A name of the pipeline used to identify it in monitoring events and to restore its state and data schemas on subsequent runs. Defaults to the file name of the pipeline script with
dlt_ prefix added.A working directory where pipeline state and temporary files will be stored. Defaults to user home directory:
~/dlt/pipelines/.A random value used for deterministic hashing during data anonymization. Defaults to a value derived from the pipeline name. The default value should not be used for cryptographic purposes.
A name of the destination to which dlt will load the data, or a destination module imported from
dlt.destination. Can be a string (destination name), a Destination instance, a callable returning a Destination, or None. May also be provided to the run method of the pipeline.A name of the destination where dlt will stage the data before final loading, or a destination module imported from
dlt.destination. May also be provided to the run method of the pipeline.A name of the dataset to which the data will be loaded. A dataset is a logical group of tables (e.g.,
schema in relational databases or folder grouping many files). May also be provided later to the run or load methods of the Pipeline. If not provided at all, defaults to the pipeline_name.A path from which the schema
yaml file will be imported on each pipeline run. Defaults to None which disables importing.A path where the schema
yaml file will be exported after every schema change. Defaults to None which disables exporting.Deprecated: Use
dev_mode instead.When set to True, each instance of the pipeline with the
pipeline_name starts from scratch when run and loads the data to a separate dataset. The datasets are identified by dataset_name_ + datetime suffix. Use this setting when experimenting with your data to ensure you start fresh on each run.Fully or partially reset sources during pipeline run. When set here, the refresh is applied on each run of the pipeline. To apply refresh only once, you can pass it to
pipeline.run or extract instead.The following refresh modes are supported:drop_sources: Drop tables and source and resource state for all sources currently being processed inrunorextractmethods. (Note: schema history is erased)drop_resources: Drop tables and resource state for all resources being processed. Source level state is not modified. (Note: schema history is erased)drop_data: Wipe all data and resource state for all resources being processed. Schema is not modified.
A progress monitor that shows progress bars, console or log messages with current information on sources, resources, data items, etc. processed in
extract, normalize, and load stages. Pass a string with a collector name or configure your own by choosing from the dlt.progress module.Supported progress libraries: tqdm, enlighten, alive_progress, or log to write to console/log.A class of the pipeline to use. Defaults to
Pipeline. This parameter is intended for advanced use cases.Returns
An instance of the
Pipeline class or a subclass. Use the run method to load data, or use extract, normalize, and load methods for more granular control.Examples
Create a new pipeline
Retrieve the current pipeline
Create a pipeline with staging
Create a pipeline in development mode
Create a pipeline with progress monitoring
Related Documentation
- Write your first pipeline walkthrough
- Pipeline architecture and data loading steps
- List of supported destinations