Skip to main content

Overview

Metaflow integrates with cloud secret management services to securely handle sensitive data like:
  • API keys and tokens
  • Database credentials
  • OAuth secrets
  • Encryption keys
  • Service account credentials
Never hardcode secrets in your flow code or commit them to version control.

Supported Secret Managers

AWS Secrets Manager

Native support for AWS Secrets Manager

Azure Key Vault

Integration with Azure Key Vault

GCP Secret Manager

Support for Google Cloud Secret Manager

AWS Secrets Manager

Configuration

Configure AWS Secrets Manager as your secrets backend:
# Set environment variable
export METAFLOW_SECRETS_BACKEND_TYPE=aws-secrets-manager
export METAFLOW_SECRETS_BACKEND_ROLE=arn:aws:iam::123456789:role/MetaflowSecretsRole

Accessing Secrets

from metaflow import FlowSpec, step, secrets

class SecureFlow(FlowSpec):
    
    @secrets(sources=['api-keys'])
    @step
    def start(self):
        from metaflow import current
        
        # Access secret from AWS Secrets Manager
        api_key = current.secrets.api_keys['API_KEY']
        
        # Use the secret
        response = call_api(api_key=api_key)
        self.result = response
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

Storing Secrets in AWS

# Store a simple secret
aws secretsmanager create-secret \
    --name api-keys \
    --secret-string '{"API_KEY":"sk-1234567890","API_SECRET":"secret123"}'

# Store from a file
aws secretsmanager create-secret \
    --name db-credentials \
    --secret-string file://credentials.json

# Update a secret
aws secretsmanager update-secret \
    --secret-id api-keys \
    --secret-string '{"API_KEY":"new-key"}'

Multiple Secret Sources

@secrets(sources=['api-keys', 'db-credentials', 'oauth-tokens'])
@step
def process(self):
    from metaflow import current
    
    # Access from different secret sources
    api_key = current.secrets['api-keys']['API_KEY']
    db_password = current.secrets['db-credentials']['PASSWORD']
    oauth_token = current.secrets['oauth-tokens']['TOKEN']
    
    # Use the secrets
    connect_to_db(password=db_password)

Secret ARNs

You can also use specific secret ARNs:
@secrets(sources=[
    'arn:aws:secretsmanager:us-east-1:123456789:secret:api-keys-abc123',
    'arn:aws:secretsmanager:us-east-1:123456789:secret:db-creds-def456'
])
@step
def secure_step(self):
    from metaflow import current
    secrets = current.secrets
    # Access secrets

Azure Key Vault

Configuration

# Set environment variables
export METAFLOW_AZURE_KEY_VAULT_PREFIX=https://myvault.vault.azure.net

Accessing Secrets

from metaflow import FlowSpec, step, secrets

class AzureSecureFlow(FlowSpec):
    
    @secrets(sources=['api-keys'])
    @step
    def start(self):
        from metaflow import current
        
        # Access secret from Azure Key Vault
        # Secrets are accessed by name
        api_key = current.secrets['api-keys']
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

Storing Secrets in Azure

# Create a secret in Azure Key Vault
az keyvault secret set \
    --vault-name myvault \
    --name api-keys \
    --value '{"API_KEY":"sk-1234567890"}'

# Set from file
az keyvault secret set \
    --vault-name myvault \
    --name db-credentials \
    --file credentials.json

Secret Versions

Azure Key Vault supports secret versioning:
# Specific version
@secrets(sources=[
    'https://myvault.vault.azure.net/secrets/api-keys/abc123'
])
@step
def versioned_secret(self):
    pass

GCP Secret Manager

Configuration

# Set environment variable
export METAFLOW_GCP_SECRET_MANAGER_PREFIX=projects/my-project/secrets

Accessing Secrets

from metaflow import FlowSpec, step, secrets

class GCPSecureFlow(FlowSpec):
    
    @secrets(sources=['api-keys'])
    @step
    def start(self):
        from metaflow import current
        
        # Access secret from GCP Secret Manager
        api_key = current.secrets['api-keys']['API_KEY']
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

Storing Secrets in GCP

# Create a secret
echo -n '{"API_KEY":"sk-1234567890"}' | \
    gcloud secrets create api-keys \
    --data-file=-

# Add a new version
echo -n '{"API_KEY":"new-key"}' | \
    gcloud secrets versions add api-keys \
    --data-file=-

# Access specific version
gcloud secrets versions access 1 --secret=api-keys

Environment Variables (Local Development)

For local development, you can use environment variables:
import os
from metaflow import FlowSpec, step

class LocalSecrets(FlowSpec):
    
    @step
    def start(self):
        # Get from environment
        api_key = os.getenv('API_KEY')
        
        if not api_key:
            raise ValueError("API_KEY not set")
        
        # Use the secret
        response = call_api(api_key=api_key)
        self.result = response
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

Best Practices

Always use the @secrets decorator instead of fetching secrets manually:
# Good
@secrets(sources=['api-keys'])
@step
def process(self):
    api_key = current.secrets['api-keys']['API_KEY']

# Avoid - manual secret fetching
@step
def process(self):
    import boto3
    client = boto3.client('secretsmanager')
    secret = client.get_secret_value(SecretId='api-keys')
Be careful not to expose secrets in logs:
@secrets(sources=['api-keys'])
@step
def process(self):
    api_key = current.secrets['api-keys']['API_KEY']
    
    # Bad - exposes secret in logs
    print(f"Using API key: {api_key}")
    
    # Good - hide the secret
    print(f"Using API key: {api_key[:4]}...")
Implement secret rotation:
@secrets(sources=['api-keys'])
@step
def process(self):
    # Secrets are fetched fresh each run
    # Rotating the secret in the secret manager
    # automatically updates it for new runs
    api_key = current.secrets['api-keys']['API_KEY']
Separate secrets for dev, staging, and production:
from metaflow import current

@secrets(sources=[f'api-keys-{current.namespace}'])
@step
def process(self):
    # Uses api-keys-dev, api-keys-prod, etc.
    pass
Only add secrets to steps that need them:
@step
def start(self):
    # No secrets needed here
    self.next(self.process)

@secrets(sources=['api-keys'])  # Only here
@step
def process(self):
    # Use secrets
    pass

Common Patterns

Database Connections

import psycopg2
from metaflow import FlowSpec, step, secrets

class DatabaseFlow(FlowSpec):
    
    @secrets(sources=['db-credentials'])
    @step
    def query(self):
        from metaflow import current
        
        creds = current.secrets['db-credentials']
        
        conn = psycopg2.connect(
            host=creds['HOST'],
            port=creds['PORT'],
            database=creds['DATABASE'],
            user=creds['USER'],
            password=creds['PASSWORD']
        )
        
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users LIMIT 10")
        self.results = cursor.fetchall()
        
        cursor.close()
        conn.close()
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

API Authentication

import requests
from metaflow import FlowSpec, step, secrets

class APIFlow(FlowSpec):
    
    @secrets(sources=['api-credentials'])
    @step
    def fetch_data(self):
        from metaflow import current
        
        creds = current.secrets['api-credentials']
        
        headers = {
            'Authorization': f"Bearer {creds['API_TOKEN']}",
            'X-API-Key': creds['API_KEY']
        }
        
        response = requests.get(
            'https://api.example.com/data',
            headers=headers
        )
        
        self.data = response.json()
        self.next(self.end)
    
    @step
    def end(self):
        pass

Service Account Keys

import json
from google.cloud import bigquery
from metaflow import FlowSpec, step, secrets

class BigQueryFlow(FlowSpec):
    
    @secrets(sources=['gcp-service-account'])
    @step
    def query_bigquery(self):
        from metaflow import current
        import tempfile
        
        # Get service account JSON
        sa_json = current.secrets['gcp-service-account']
        
        # Write to temporary file
        with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
            json.dump(sa_json, f)
            sa_file = f.name
        
        # Use service account
        client = bigquery.Client.from_service_account_json(sa_file)
        
        query = "SELECT * FROM `project.dataset.table` LIMIT 10"
        self.results = list(client.query(query))
        
        # Clean up
        os.unlink(sa_file)
        
        self.next(self.end)
    
    @step
    def end(self):
        pass

IAM Permissions

AWS IAM Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:us-east-1:123456789:secret:api-keys-*",
        "arn:aws:secretsmanager:us-east-1:123456789:secret:db-credentials-*"
      ]
    }
  ]
}

Azure RBAC

# Grant Key Vault Secrets User role
az role assignment create \
    --role "Key Vault Secrets User" \
    --assignee <managed-identity-id> \
    --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>

GCP IAM

# Grant Secret Manager Secret Accessor role
gcloud projects add-iam-policy-binding my-project \
    --member="serviceAccount:[email protected]" \
    --role="roles/secretmanager.secretAccessor"

Troubleshooting

Check IAM permissions:
# AWS - verify IAM role
aws sts get-caller-identity
aws secretsmanager describe-secret --secret-id api-keys

# Azure - check access
az keyvault secret show --vault-name myvault --name api-keys

# GCP - verify service account
gcloud secrets describe api-keys
Verify secret exists and name is correct:
# AWS
aws secretsmanager list-secrets

# Azure
az keyvault secret list --vault-name myvault

# GCP
gcloud secrets list
Ensure secrets are valid JSON:
# Secrets should be valid JSON
{
  "API_KEY": "value",
  "API_SECRET": "secret"
}

# Not plain text
sk-1234567890

AWS Configuration

Configure AWS Secrets Manager

Azure Configuration

Configure Azure Key Vault

GCP Configuration

Configure GCP Secret Manager

Environment Decorator

Setting environment variables

Build docs developers (and LLMs) love