Configuration Management
Manage dlt configuration using TOML files, environment variables, and secret providers. dlt provides a flexible, hierarchical configuration system.
Configuration Sources
dlt resolves configuration from multiple sources in priority order:
Environment variables (highest priority)
secrets.toml (secrets only)
config.toml (non-secret values)
Code defaults (lowest priority)
Secret values (credentials, API keys) should be stored in secrets.toml or environment variables, never in config.toml.
TOML Configuration Files
Project Structure
my_pipeline/
├── .dlt/
│ ├── config.toml # Non-secret configuration
│ └── secrets.toml # Secret values (gitignored)
├── my_pipeline.py
└── requirements.txt
config.toml
Store non-sensitive configuration:
# Runtime configuration
[ runtime ]
log_level = "INFO"
request_timeout = 60
request_max_attempts = 5
# Pipeline defaults
pipeline_name = "my_pipeline"
# Destination configuration
[ destination . postgres ]
dataset_name = "analytics"
# Source configuration
[ sources . my_api ]
base_url = "https://api.example.com"
page_size = 100
start_date = "2024-01-01"
# Performance tuning
[ extract ]
workers = 5
max_parallel_items = 20
[ extract . data_writer ]
buffer_max_items = 10000
file_max_items = 100000
[ normalize ]
workers = 4
[ normalize . data_writer ]
file_max_items = 100000
file_max_bytes = 10485760 # 10MB
[ load ]
workers = 10
delete_completed_jobs = true
secrets.toml
Store sensitive credentials (never commit this file):
# API credentials
[ sources . my_api ]
api_key = "sk_live_abc123..."
api_secret = "secret_xyz789..."
# Database credentials
[ destination . postgres . credentials ]
database = "analytics_db"
username = "data_loader"
password = "secure_password_here"
host = "db.example.com"
port = 5432
# BigQuery credentials
[ destination . bigquery . credentials ]
project_id = "my-gcp-project"
private_key = "-----BEGIN PRIVATE KEY----- \n MIIE... \n -----END PRIVATE KEY-----"
client_email = "[email protected] "
# Snowflake credentials
[ destination . snowflake . credentials ]
account = "my_account.us-east-1"
user = "LOADER_USER"
password = "snowflake_password"
warehouse = "LOADING_WH"
database = "ANALYTICS"
role = "LOADER_ROLE"
Always add secrets.toml to .gitignore: # .gitignore
.dlt/secrets.toml
.dlt/*.secrets.toml
Environment Variables
Override any configuration value with environment variables:
Naming Convention
Environment variables use double underscores (__) as separators:
# Format: SECTION__SUBSECTION__KEY
export SOURCES__MY_API__API_KEY = "abc123"
export DESTINATION__POSTGRES__CREDENTIALS__PASSWORD = "secret"
export RUNTIME__LOG_LEVEL = "DEBUG"
export EXTRACT__WORKERS = "10"
Examples
Source configuration
Destination configuration
Performance tuning
Runtime settings
# API credentials
export SOURCES__MY_API__API_KEY = "sk_live_abc123"
export SOURCES__MY_API__BASE_URL = "https://api.example.com"
# REST API configuration
export SOURCES__REST_API__ENDPOINT = "https://api.example.com/v1"
export SOURCES__REST_API__PAGE_SIZE = "100"
Using Configuration in Code
Access Configuration Values
import dlt
from dlt.common.configuration import resolve
# Configuration is automatically resolved from all sources
pipeline = dlt.pipeline(
pipeline_name = dlt.config[ 'pipeline_name' ], # From config.toml
destination = 'postgres' , # Credentials from secrets.toml
dataset_name = dlt.config.get( 'dataset_name' , 'default_dataset' )
)
Define Custom Configuration
Create typed configuration classes:
from dlt.common.configuration.specs import BaseConfiguration, configspec
from dlt.common.typing import TSecretStrValue
@configspec
class MyApiConfig ( BaseConfiguration ):
api_key: TSecretStrValue # Secret value
base_url: str = "https://api.example.com" # Default value
timeout: int = 30
max_retries: int = 3
# Resolve configuration
config = resolve.resolve_configuration(MyApiConfig())
print ( f "Using API at { config.base_url } " )
Source-Specific Configuration
Configure sources with @dlt.source:
import dlt
from dlt.sources.config import configspec
@dlt.source
def my_api_source (
api_key : str = dlt.secrets.value,
base_url : str = dlt.config.value,
page_size : int = 100
):
"""Source with configuration"""
@dlt.resource
def users ():
# Use configuration values
url = f " { base_url } /users?page_size= { page_size } "
headers = { "Authorization" : f "Bearer { api_key } " }
# ... fetch data
yield data
return users
# Configuration is automatically injected
pipeline.run(my_api_source())
Configuration Hierarchy
Section-Based Configuration
Organize configuration by pipeline name and source:
# Global source configuration
[ sources . my_api ]
base_url = "https://api.example.com"
# Pipeline-specific override
[ sources . my_api . production_pipeline ]
base_url = "https://production-api.example.com"
# Resource-specific configuration
[ sources . my_api . resources . users ]
page_size = 500
[ sources . my_api . resources . orders ]
page_size = 1000
Configuration Precedence
For a pipeline named production_pipeline, configuration is resolved in this order:
Environment: SOURCES__MY_API__PRODUCTION_PIPELINE__BASE_URL
Secrets: [sources.my_api.production_pipeline]
Secrets: [sources.my_api]
Config: [sources.my_api.production_pipeline]
Config: [sources.my_api]
Code defaults
Secret Providers
Google Cloud Secret Manager
from dlt.common.configuration.providers import GoogleSecretsProvider
# Use GCP Secret Manager
provider = GoogleSecretsProvider(
project_id = "my-gcp-project" ,
credentials = "path/to/service-account.json"
)
# Access secrets
api_key = provider.get_value(
"api_key" ,
hint = str ,
pipeline_name = "my_pipeline" ,
"sources" ,
"my_api"
)
AWS Secrets Manager
import boto3
import json
# Fetch from AWS Secrets Manager
secrets_client = boto3.client( 'secretsmanager' )
response = secrets_client.get_secret_value( SecretId = 'my-pipeline-secrets' )
secrets = json.loads(response[ 'SecretString' ])
# Set as environment variables
import os
os.environ[ 'SOURCES__MY_API__API_KEY' ] = secrets[ 'api_key' ]
Airflow Variables
When running in Airflow, use the dlt_secrets_toml variable:
# Airflow automatically loads from Variable 'dlt_secrets_toml'
# No code changes needed!
pipeline = dlt.pipeline(
pipeline_name = 'airflow_pipeline' ,
destination = 'bigquery'
)
Configuration Best Practices
Separate secrets from configuration
Never store secrets in config.toml. Use secrets.toml or environment variables. # ❌ Wrong - in config.toml
[ sources . api ]
api_key = "secret123" # DON'T DO THIS
# ✅ Correct - in secrets.toml
[ sources . api ]
api_key = "secret123" # Safe
Use environment variables in CI/CD
Set secrets as environment variables in your deployment environment: # GitHub Actions
env :
DESTINATION__BIGQUERY__CREDENTIALS : ${{ secrets.GCP_CREDENTIALS }}
SOURCES__API__API_KEY : ${{ secrets.API_KEY }}
Use defaults wisely
Provide sensible defaults in code, override as needed: @dlt.source
def my_source (
base_url : str = "https://api.example.com" , # Default
timeout : int = 30 , # Default
api_key : str = dlt.secrets.value # Must be provided
):
pass
Document required configuration
Create a config.toml.example file: # config.toml.example
[ sources . my_api ]
base_url = "https://api.example.com"
page_size = 100
[ destination . postgres ]
dataset_name = "analytics"
And secrets.toml.example: # secrets.toml.example
[ sources . my_api ]
api_key = "your_api_key_here"
[ destination . postgres . credentials ]
password = "your_password_here"
Configuration tips:
Use environment variables for secrets in production
Keep config.toml in version control
Never commit secrets.toml
Use typed configuration classes for validation
Provide sensible defaults
Document all required configuration
Security warnings:
Never log secret values
Don’t print configuration objects (may contain secrets)
Always use .gitignore for secrets.toml
Rotate credentials regularly
Use secret managers in production
Debugging Configuration
Find out where configuration values are loaded from:
import dlt
from dlt.common.configuration import resolve
import logging
# Enable debug logging
logging.basicConfig( level = logging. DEBUG )
# Configuration resolution will log source of each value
config = resolve.resolve_configuration(MyApiConfig())
This will show output like:
DEBUG:dlt.config:Resolved 'api_key' from Environment Variables
DEBUG:dlt.config:Resolved 'base_url' from config.toml [sources.my_api]
DEBUG:dlt.config:Using default for 'timeout'