This tutorial takes approximately 5-10 minutes to complete. You’ll create a working pipeline, run it, and query the results.
What You’ll Build
You’ll create a pipeline that:- Fetches player data from the Chess.com public API
- Automatically infers the schema from the JSON response
- Loads the data into DuckDB (a fast, embedded SQL database)
- Can be queried immediately using SQL
Prerequisites
- Python 3.9 or higher installed on your system
- Basic familiarity with Python
- A terminal or command prompt
Install dlt
First, install dlt using pip. We’ll also install DuckDB support, which is our destination database.
Create Your Pipeline Script
Create a new Python file called Let’s break down what this code does:
chess_pipeline.py and add the following code:chess_pipeline.py
- Create a pipeline: The
dlt.pipeline()function creates a pipeline with a name, destination (DuckDB), and dataset name - Fetch data: We loop through player usernames and fetch their data from the Chess.com API
- Load data: The
pipeline.run()method automatically infers the schema, normalizes the JSON data, and loads it into theplayertable - Print info: The run info contains metadata about what was loaded
Run Your Pipeline
Execute your pipeline script:You should see output indicating the pipeline ran successfully, including information about the loaded data:
Success! Your pipeline has loaded data into DuckDB. The database file
chess_pipeline.duckdb was created in your current directory.Query Your Data
Now let’s query the data you just loaded. Create a new file called Run the query script:You’ll see the player data you loaded, including usernames, titles, followers, and more.
query_data.py:query_data.py
Understand What Happened
When you ran your pipeline, dlt automatically:
- Inferred the schema - Examined the JSON structure and determined table columns and data types
- Created tables - Set up the
playertable in DuckDB with the appropriate schema - Normalized data - Converted the nested JSON into relational tables (if there were nested structures, dlt would create child tables)
- Loaded data - Inserted the player records into the database
- Tracked state - Saved pipeline metadata for incremental loading in future runs
pipeline.run()!Next Steps
Congratulations! You’ve built your first dlt pipeline. Here’s what you can explore next:Core Concepts
Learn about pipelines, sources, resources, and destinations in depth
Incremental Loading
Load only new or changed data instead of full refreshes
Destinations
Explore 20+ supported destinations like BigQuery, Snowflake, and PostgreSQL
Verified Sources
Use pre-built sources for popular APIs and services
Common Patterns
Loading Different Data Types
dlt can load various data types beyond API responses:Using Different Destinations
Switch destinations by changing thedestination parameter:
Incremental Loading
Load only new data on subsequent runs:Troubleshooting
Import Error: No module named 'dlt'
Import Error: No module named 'dlt'
Make sure you’ve installed dlt in your current Python environment:If using a virtual environment, ensure it’s activated.
API Request Failed
API Request Failed
The Chess.com API is public and doesn’t require authentication. If you get connection errors:
- Check your internet connection
- Verify the API is accessible: visit https://api.chess.com/pub/player/magnuscarlsen in your browser
- Check if you’re behind a proxy or firewall
DuckDB Database Locked
DuckDB Database Locked
If you get a “database is locked” error, make sure:
- You’ve closed any previous connections to the database
- No other process is accessing the
.duckdbfile - You’re using
with pipeline.sql_client() as client:to ensure connections are properly closed
Schema Inference Issues
Schema Inference Issues
If dlt infers the wrong data types:
- Provide explicit hints using the
columnsparameter - Define a custom schema
- See the schema documentation for details
Learn More
Join Slack Community
Get help from thousands of dlt users and the core team
Browse Examples
Explore real-world pipeline examples and use cases
Full Documentation
Deep dive into all dlt features and capabilities
GitHub Repository
View source code, report issues, and contribute