Skip to main content
The LangSmith SDK provides utilities to anonymize sensitive data in traces. Use the createAnonymizer function to automatically redact PII, secrets, and other sensitive information before it’s sent to LangSmith.

How it works

The anonymizer:
  1. Extracts all string values from your data (inputs, outputs, metadata)
  2. Applies rules or custom functions to detect and replace sensitive patterns
  3. Reconstructs the data structure with redacted values
All data structures (nested objects, arrays) are preserved - only string values are modified.

Basic usage with regex patterns

Define patterns to match and redact:
from langsmith.anonymizer import create_anonymizer
import re

# Define rules for redaction
anonymizer = create_anonymizer(
    [
        {"pattern": re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I), "replace": "[email]"},
        {"pattern": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "replace": "[ssn]"},
        {"pattern": re.compile(r"sk-[a-zA-Z0-9]{32,}"), "replace": "[api-key]"},
    ]
)

# Apply to data
data = {
    "message": "Contact [email protected] for API key sk-abc123xyz456",
    "ssn": "123-45-6789"
}

redacted = anonymizer(data)
print(redacted)
# Output:
# {
#   "message": "Contact [email] for API key [api-key]",
#   "ssn": "[ssn]"
# }

Using with traceable

Integrate anonymization into your tracing workflow:
from langsmith import traceable
from langsmith.anonymizer import create_anonymizer
import re

# Create anonymizer
anonymizer = create_anonymizer(
    [
        {"pattern": re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I), "replace": "[email]"},
        {"pattern": re.compile(r"\d{16}"), "replace": "[credit-card]"},
    ]
)

@traceable(
    process_inputs=anonymizer,
    process_outputs=anonymizer
)
def process_user_data(user_input: dict) -> dict:
    # Process data - sensitive info is redacted in traces
    response = {
        "message": f"Processed request from {user_input['email']}",
        "payment": user_input.get("card")
    }
    return response

# Sensitive data in traces will be redacted
result = process_user_data({
    "email": "[email protected]",
    "card": "1234567890123456"
})

Custom anonymizer function

Use a custom function for complex logic:
from langsmith.anonymizer import create_anonymizer

def custom_redactor(value: str, path: list) -> str:
    """Custom function to redact based on value and path."""
    # Redact based on field path
    if "password" in path or "secret" in path:
        return "[redacted]"
    
    # Redact phone numbers
    if len(value) == 10 and value.isdigit():
        return "[phone]"
    
    # Redact credit cards (basic check)
    if len(value) == 16 and value.isdigit():
        return "[credit-card]"
    
    return value

anonymizer = create_anonymizer(custom_redactor)

data = {
    "user": {
        "name": "Alice",
        "password": "super-secret-123",
        "phone": "5551234567"
    },
    "payment": "1234567890123456"
}

redacted = anonymizer(data)
print(redacted)
# Output:
# {
#   "user": {
#     "name": "Alice",
#     "password": "[redacted]",
#     "phone": "[phone]"
#   },
#   "payment": "[credit-card]"
# }

Common patterns

Here are ready-to-use patterns for common sensitive data:
import re
from langsmith.anonymizer import create_anonymizer

# Common PII patterns
common_patterns = [
    # Email addresses
    {"pattern": re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I), "replace": "[email]"},
    
    # Phone numbers (US format)
    {"pattern": re.compile(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b"), "replace": "[phone]"},
    
    # Social Security Numbers
    {"pattern": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "replace": "[ssn]"},
    
    # Credit card numbers (basic)
    {"pattern": re.compile(r"\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b"), "replace": "[credit-card]"},
    
    # API keys (common formats)
    {"pattern": re.compile(r"sk-[a-zA-Z0-9]{32,}"), "replace": "[api-key]"},
    {"pattern": re.compile(r"[a-zA-Z0-9_-]{32,}"), "replace": "[token]"},
    
    # IP addresses
    {"pattern": re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b"), "replace": "[ip]"},
    
    # URLs with credentials
    {"pattern": re.compile(r"https?://[^:]+:[^@]+@"), "replace": "https://[credentials]@"},
]

anonymizer = create_anonymizer(common_patterns)

Controlling traversal depth

Limit how deep the anonymizer traverses nested objects:
from langsmith.anonymizer import create_anonymizer
import re

anonymizer = create_anonymizer(
    [{"pattern": re.compile(r"secret", re.I), "replace": "[redacted]"}],
    max_depth=5  # Only traverse 5 levels deep
)

data = {
    "level1": {
        "level2": {
            "level3": {
                "level4": {
                    "level5": {
                        "level6": "secret data"  # May not be redacted if max_depth=5
                    }
                }
            }
        }
    }
}

Advanced: Custom node processor

For maximum control, implement a custom StringNodeProcessor:
from langsmith.anonymizer import create_anonymizer, StringNodeProcessor, StringNode
from typing import Any

class CustomProcessor(StringNodeProcessor):
    def mask_nodes(self, nodes: list[StringNode]) -> list[StringNode]:
        """Process all string nodes at once."""
        result = []
        for node in nodes:
            value = node["value"]
            path = node["path"]
            
            # Custom logic
            if "email" in str(path).lower():
                result.append({"value": "[email]", "path": path})
            elif len(value) > 100:
                result.append({"value": value[:50] + "...[truncated]", "path": path})
        
        return result

anonymizer = create_anonymizer(CustomProcessor())

Best practices

1
Start with common patterns
2
Use the ready-made patterns for emails, phones, SSNs, and API keys as a baseline.
3
Test your anonymizer
4
Validate that sensitive data is actually being redacted:
5
test_data = {"email": "[email protected]", "secret": "sk-abc123"}
redacted = anonymizer(test_data)
assert "[email protected]" not in str(redacted)
assert "sk-abc123" not in str(redacted)
6
Be cautious with overly broad patterns
7
Avoid patterns that might redact too much:
8
# Bad: Matches almost any string
{"pattern": re.compile(r"[a-zA-Z0-9]+"), "replace": "[redacted]"}

# Good: Specific pattern
{"pattern": re.compile(r"sk-[a-zA-Z0-9]{32,}"), "replace": "[api-key]"}
9
Use different labels for different types
10
Make it clear what was redacted:
11
"[email]", "[ssn]", "[api-key]", "[credit-card]"
# Not just "[redacted]" for everything
12
Consider performance
13
For high-volume tracing, keep patterns efficient and limit max_depth.

Important notes

Anonymization happens client-side before data is sent to LangSmith. Once data is sent without anonymization, it cannot be retroactively redacted.
  • The anonymizer only processes string values - other types (numbers, booleans) are unchanged
  • Nested objects and arrays are preserved - structure is maintained
  • Regex patterns should use the g (global) flag to replace all occurrences
  • Python uses re.compile() while TypeScript uses regex literals

Build docs developers (and LLMs) love