Overview
Metaflow integrates with cloud secret management services to securely handle sensitive data like:
API keys and tokens
Database credentials
OAuth secrets
Encryption keys
Service account credentials
Never hardcode secrets in your flow code or commit them to version control.
Supported Secret Managers
AWS Secrets Manager Native support for AWS Secrets Manager
Azure Key Vault Integration with Azure Key Vault
GCP Secret Manager Support for Google Cloud Secret Manager
AWS Secrets Manager
Configuration
Configure AWS Secrets Manager as your secrets backend:
# Set environment variable
export METAFLOW_SECRETS_BACKEND_TYPE = aws-secrets-manager
export METAFLOW_SECRETS_BACKEND_ROLE = arn : aws : iam :: 123456789 : role / MetaflowSecretsRole
Accessing Secrets
from metaflow import FlowSpec, step, secrets
class SecureFlow ( FlowSpec ):
@secrets ( sources = [ 'api-keys' ])
@step
def start ( self ):
from metaflow import current
# Access secret from AWS Secrets Manager
api_key = current.secrets.api_keys[ 'API_KEY' ]
# Use the secret
response = call_api( api_key = api_key)
self .result = response
self .next( self .end)
@step
def end ( self ):
pass
Storing Secrets in AWS
# Store a simple secret
aws secretsmanager create-secret \
--name api-keys \
--secret-string '{"API_KEY":"sk-1234567890","API_SECRET":"secret123"}'
# Store from a file
aws secretsmanager create-secret \
--name db-credentials \
--secret-string file://credentials.json
# Update a secret
aws secretsmanager update-secret \
--secret-id api-keys \
--secret-string '{"API_KEY":"new-key"}'
Multiple Secret Sources
@secrets ( sources = [ 'api-keys' , 'db-credentials' , 'oauth-tokens' ])
@step
def process ( self ):
from metaflow import current
# Access from different secret sources
api_key = current.secrets[ 'api-keys' ][ 'API_KEY' ]
db_password = current.secrets[ 'db-credentials' ][ 'PASSWORD' ]
oauth_token = current.secrets[ 'oauth-tokens' ][ 'TOKEN' ]
# Use the secrets
connect_to_db( password = db_password)
Secret ARNs
You can also use specific secret ARNs:
@secrets ( sources = [
'arn:aws:secretsmanager:us-east-1:123456789:secret:api-keys-abc123' ,
'arn:aws:secretsmanager:us-east-1:123456789:secret:db-creds-def456'
])
@step
def secure_step ( self ):
from metaflow import current
secrets = current.secrets
# Access secrets
Azure Key Vault
Configuration
# Set environment variables
export METAFLOW_AZURE_KEY_VAULT_PREFIX = https :// myvault . vault . azure . net
Accessing Secrets
from metaflow import FlowSpec, step, secrets
class AzureSecureFlow ( FlowSpec ):
@secrets ( sources = [ 'api-keys' ])
@step
def start ( self ):
from metaflow import current
# Access secret from Azure Key Vault
# Secrets are accessed by name
api_key = current.secrets[ 'api-keys' ]
self .next( self .end)
@step
def end ( self ):
pass
Storing Secrets in Azure
# Create a secret in Azure Key Vault
az keyvault secret set \
--vault-name myvault \
--name api-keys \
--value '{"API_KEY":"sk-1234567890"}'
# Set from file
az keyvault secret set \
--vault-name myvault \
--name db-credentials \
--file credentials.json
Secret Versions
Azure Key Vault supports secret versioning:
# Specific version
@secrets ( sources = [
'https://myvault.vault.azure.net/secrets/api-keys/abc123'
])
@step
def versioned_secret ( self ):
pass
GCP Secret Manager
Configuration
# Set environment variable
export METAFLOW_GCP_SECRET_MANAGER_PREFIX = projects / my-project / secrets
Accessing Secrets
from metaflow import FlowSpec, step, secrets
class GCPSecureFlow ( FlowSpec ):
@secrets ( sources = [ 'api-keys' ])
@step
def start ( self ):
from metaflow import current
# Access secret from GCP Secret Manager
api_key = current.secrets[ 'api-keys' ][ 'API_KEY' ]
self .next( self .end)
@step
def end ( self ):
pass
Storing Secrets in GCP
# Create a secret
echo -n '{"API_KEY":"sk-1234567890"}' | \
gcloud secrets create api-keys \
--data-file=-
# Add a new version
echo -n '{"API_KEY":"new-key"}' | \
gcloud secrets versions add api-keys \
--data-file=-
# Access specific version
gcloud secrets versions access 1 --secret=api-keys
Environment Variables (Local Development)
For local development, you can use environment variables:
import os
from metaflow import FlowSpec, step
class LocalSecrets ( FlowSpec ):
@step
def start ( self ):
# Get from environment
api_key = os.getenv( 'API_KEY' )
if not api_key:
raise ValueError ( "API_KEY not set" )
# Use the secret
response = call_api( api_key = api_key)
self .result = response
self .next( self .end)
@step
def end ( self ):
pass
Best Practices
Use the @secrets decorator
Always use the @secrets decorator instead of fetching secrets manually: # Good
@secrets ( sources = [ 'api-keys' ])
@step
def process ( self ):
api_key = current.secrets[ 'api-keys' ][ 'API_KEY' ]
# Avoid - manual secret fetching
@step
def process ( self ):
import boto3
client = boto3.client( 'secretsmanager' )
secret = client.get_secret_value( SecretId = 'api-keys' )
Be careful not to expose secrets in logs: @secrets ( sources = [ 'api-keys' ])
@step
def process ( self ):
api_key = current.secrets[ 'api-keys' ][ 'API_KEY' ]
# Bad - exposes secret in logs
print ( f "Using API key: { api_key } " )
# Good - hide the secret
print ( f "Using API key: { api_key[: 4 ] } ..." )
Implement secret rotation: @secrets ( sources = [ 'api-keys' ])
@step
def process ( self ):
# Secrets are fetched fresh each run
# Rotating the secret in the secret manager
# automatically updates it for new runs
api_key = current.secrets[ 'api-keys' ][ 'API_KEY' ]
Use different secrets for environments
Separate secrets for dev, staging, and production: from metaflow import current
@secrets ( sources = [ f 'api-keys- { current.namespace } ' ])
@step
def process ( self ):
# Uses api-keys-dev, api-keys-prod, etc.
pass
Only add secrets to steps that need them: @step
def start ( self ):
# No secrets needed here
self .next( self .process)
@secrets ( sources = [ 'api-keys' ]) # Only here
@step
def process ( self ):
# Use secrets
pass
Common Patterns
Database Connections
import psycopg2
from metaflow import FlowSpec, step, secrets
class DatabaseFlow ( FlowSpec ):
@secrets ( sources = [ 'db-credentials' ])
@step
def query ( self ):
from metaflow import current
creds = current.secrets[ 'db-credentials' ]
conn = psycopg2.connect(
host = creds[ 'HOST' ],
port = creds[ 'PORT' ],
database = creds[ 'DATABASE' ],
user = creds[ 'USER' ],
password = creds[ 'PASSWORD' ]
)
cursor = conn.cursor()
cursor.execute( "SELECT * FROM users LIMIT 10" )
self .results = cursor.fetchall()
cursor.close()
conn.close()
self .next( self .end)
@step
def end ( self ):
pass
API Authentication
import requests
from metaflow import FlowSpec, step, secrets
class APIFlow ( FlowSpec ):
@secrets ( sources = [ 'api-credentials' ])
@step
def fetch_data ( self ):
from metaflow import current
creds = current.secrets[ 'api-credentials' ]
headers = {
'Authorization' : f "Bearer { creds[ 'API_TOKEN' ] } " ,
'X-API-Key' : creds[ 'API_KEY' ]
}
response = requests.get(
'https://api.example.com/data' ,
headers = headers
)
self .data = response.json()
self .next( self .end)
@step
def end ( self ):
pass
Service Account Keys
import json
from google.cloud import bigquery
from metaflow import FlowSpec, step, secrets
class BigQueryFlow ( FlowSpec ):
@secrets ( sources = [ 'gcp-service-account' ])
@step
def query_bigquery ( self ):
from metaflow import current
import tempfile
# Get service account JSON
sa_json = current.secrets[ 'gcp-service-account' ]
# Write to temporary file
with tempfile.NamedTemporaryFile( mode = 'w' , delete = False ) as f:
json.dump(sa_json, f)
sa_file = f.name
# Use service account
client = bigquery.Client.from_service_account_json(sa_file)
query = "SELECT * FROM `project.dataset.table` LIMIT 10"
self .results = list (client.query(query))
# Clean up
os.unlink(sa_file)
self .next( self .end)
@step
def end ( self ):
pass
IAM Permissions
AWS IAM Policy
{
"Version" : "2012-10-17" ,
"Statement" : [
{
"Effect" : "Allow" ,
"Action" : [
"secretsmanager:GetSecretValue" ,
"secretsmanager:DescribeSecret"
],
"Resource" : [
"arn:aws:secretsmanager:us-east-1:123456789:secret:api-keys-*" ,
"arn:aws:secretsmanager:us-east-1:123456789:secret:db-credentials-*"
]
}
]
}
Azure RBAC
# Grant Key Vault Secrets User role
az role assignment create \
--role "Key Vault Secrets User" \
--assignee < managed-identity-i d > \
--scope /subscriptions/ < sub-i d > /resourceGroups/ < r g > /providers/Microsoft.KeyVault/vaults/ < vault-nam e >
GCP IAM
# Grant Secret Manager Secret Accessor role
gcloud projects add-iam-policy-binding my-project \
--member= "serviceAccount:[email protected] " \
--role= "roles/secretmanager.secretAccessor"
Troubleshooting
Check IAM permissions: # AWS - verify IAM role
aws sts get-caller-identity
aws secretsmanager describe-secret --secret-id api-keys
# Azure - check access
az keyvault secret show --vault-name myvault --name api-keys
# GCP - verify service account
gcloud secrets describe api-keys
Verify secret exists and name is correct: # AWS
aws secretsmanager list-secrets
# Azure
az keyvault secret list --vault-name myvault
# GCP
gcloud secrets list
AWS Configuration Configure AWS Secrets Manager
Azure Configuration Configure Azure Key Vault
GCP Configuration Configure GCP Secret Manager
Environment Decorator Setting environment variables