What is dlt?
dlt (data load tool) is an open-source Python library that automates all your tedious data loading tasks. It’s designed to be lightweight, flexible, and work anywhere Python runs—from Google Colab notebooks to AWS Lambda functions, Airflow DAGs, your local laptop, or GPT-4 assisted development playgrounds.dlt is a library, not a platform. It respects your existing workflows and integrates seamlessly with other libraries you already use.
Why dlt?
dlt eliminates the complexity of building and maintaining data pipelines by providing:- Automatic schema inference - dlt infers schemas and data types from your data
- Nested data handling - Automatically normalizes complex, nested data structures
- Incremental loading - Load only new or changed data with built-in state management
- Multiple destinations - Support for 20+ popular databases, data warehouses, and vector stores
- Python-first design - Clean, Pythonic interfaces that feel natural
- No backend required - Everything runs in your Python environment
Key Features
Extract from Anywhere
Load data from REST APIs, SQL databases, cloud storage, Python data structures, and 5000+ sources via the dlt Hub.
Schema Evolution
Schemas automatically evolve as your data changes. No more broken pipelines from unexpected fields.
Incremental Loading
Efficiently load only new or changed data with automatic state tracking and cursor management.
Multiple Destinations
Load to DuckDB, PostgreSQL, BigQuery, Snowflake, Redshift, and many more with the same code.
Data Quality
Built-in data validation, contracts, and type checking ensure data quality.
Deploy Anywhere
Run on Airflow, serverless functions, notebooks, or any Python environment.
Quick Example
Here’s how simple it is to load data with dlt:How dlt Works
dlt operates in three main stages:Extract
Pull data from sources using Python generators, functions, or iterators. dlt handles pagination, rate limiting, and state management.
Normalize
Transform nested structures into relational tables, infer schemas, and apply data types automatically.
Core Concepts
Pipelines
A pipeline moves data from source to destination, managing state and configuration.
Sources
Decorated Python functions that define where and how to extract data.
Resources
Individual data endpoints within a source that yield data items.
Destinations
Where your data gets loaded—databases, warehouses, or file systems.
Use Cases
dlt is perfect for:- API Integration - Quickly build pipelines to load data from REST APIs
- Database Replication - Sync data between databases with incremental loading
- Data Warehousing - Build ELT pipelines to populate your data warehouse
- LLM Applications - Prepare and load data for RAG systems and vector databases
- Analytics - Load data for analysis in notebooks or dashboards
- ETL Automation - Replace complex, brittle ETL scripts with maintainable code
Design Philosophy
dlt is built with these principles:- Multiply, don’t add - We automate repetitive tasks so you focus on your data logic
- No black boxes - Everything is transparent, inspectable, and understandable
- Pythonic - APIs feel natural to Python developers
- LLM-native - Designed to work seamlessly with AI-assisted development
Getting Started
Ready to build your first pipeline? Check out the Quickstart Guide to get up and running in minutes.Quickstart
Build your first pipeline in 5 minutes
Installation
Install dlt and optional dependencies
Core Concepts
Understand pipelines, sources, and resources
Examples
Browse real-world examples and tutorials
Community and Support
dlt has a thriving community of developers building the future of data loading together.- Slack Community - Join thousands of users and get help from the community
- GitHub - Report issues, suggest features, or contribute code
- Documentation - Comprehensive guides and API references
- Examples - Real-world code examples for common use cases
dlt is production-ready and used by thousands of engineers worldwide. It’s maintained by dltHub Inc. and actively developed with frequent releases.