PII Detection & Handling

Overview

The Secure MCP Gateway provides automatic detection and redaction of Personally Identifiable Information (PII) to protect sensitive user data. The PII handling system operates transparently:

Input: PII is detected and redacted before sending to MCP servers
Processing: MCP servers see only redacted/anonymized data
Output: PII is automatically restored in responses (de-anonymization)

Zero Trust for PII: Even trusted MCP servers never see original PII values, reducing data exposure risk.

How PII Redaction Works

Complete Flow

Step-by-Step Process

PII Detection

Input Analysis: Scan request for PII entitiesThe PII handler analyzes the input text using pattern matching and NLP models:

async def detect_pii(content: str) -> List[GuardrailViolation]:
    # Call Enkrypt PII API with mode="request"
    payload = {
        "text": content,
        "mode": "request",
        "key": "null"  # No existing mapping
    }
    
    result = await call_pii_api(payload)
    
    # If text changed, PII was detected
    if result["text"] != content:
        return [PII_VIOLATION]
    return []

Example:

Input: "Contact John Smith at [email protected] or 555-123-4567"

Detected:
- NAME: "John Smith"
- EMAIL: "[email protected]"  
- PHONE: "555-123-4567"

PII Redaction

Token Replacement: Replace PII with anonymized tokensEach PII entity is replaced with a unique token:

async def redact_pii(content: str) -> tuple[str, Dict[str, Any]]:
    payload = {
        "text": content,
        "mode": "request",
        "key": "null"
    }
    
    result = await call_pii_api(payload)
    
    redacted_text = result["text"]
    pii_key = result["key"]  # Unique key for this session
    
    return redacted_text, {"key": pii_key}

Example:

Original: "Contact John Smith at [email protected] or 555-123-4567"

Redacted: "Contact [NAME_1] at [EMAIL_1] or [PHONE_1]"

Mapping (stored with key "abc123xyz"):
{
  "NAME_1": "John Smith",
  "EMAIL_1": "[email protected]",
  "PHONE_1": "555-123-4567"
}

Protected Processing

Server Communication: Send redacted text to MCP serverThe MCP server receives only anonymized data:

MCP Server receives: "Contact [NAME_1] at [EMAIL_1] or [PHONE_1]"

Server processes request without seeing actual PII

MCP Server responds: "Contact request sent to [EMAIL_1]. [NAME_1] will be notified at [PHONE_1]."

PII Restoration (De-anonymization)

Token Replacement (Reverse): Restore original PII in responseUsing the stored mapping, tokens are replaced with original values:

async def restore_pii(content: str, pii_mapping: Dict[str, Any]) -> str:
    pii_key = pii_mapping.get("key", "")
    if not pii_key:
        return content  # No PII to restore
    
    payload = {
        "text": content,
        "mode": "response",
        "key": pii_key  # Use same key from redaction
    }
    
    result = await call_pii_api(payload)
    return result["text"]

Example:

Server Response: "Contact request sent to [EMAIL_1]. [NAME_1] will be notified at [PHONE_1]."

Restored: "Contact request sent to [email protected]. John Smith will be notified at 555-123-4567."

Supported PII Types

Personal Information

Names:

Person names (first, last, full)
Organization names
Nicknames and aliases

Identifiers:

Social Security Numbers (SSN)
Tax IDs (EIN, ITIN)
National ID numbers
Passport numbers
Driver’s license numbers

Example:

Input: "John Q. Public, SSN 123-45-6789"
Redacted: "[NAME_1], SSN [SSN_1]"

Contact Information

Email Addresses:

Standard emails ([email protected])
Subdomains ([email protected])
Plus addressing ([email protected])

Phone Numbers:

US format: (555) 123-4567
International: +1-555-123-4567
Extensions: 555-1234 x567

Physical Addresses:

Street addresses
Cities, states, ZIP codes
Country information
PO boxes

Example:

Input: "Email: [email protected], Phone: +1-555-0100, Address: 123 Main St, New York, NY 10001"
Redacted: "Email: [EMAIL_1], Phone: [PHONE_1], Address: [ADDRESS_1]"

Financial Information

Payment Cards:

Credit card numbers (Visa, MasterCard, Amex, Discover)
Debit card numbers
CVV codes
Expiration dates

Bank Details:

Account numbers
Routing numbers
IBAN codes
SWIFT codes

Example:

Input: "Card: 4532-1234-5678-9010, CVV: 123, Exp: 12/25"
Redacted: "Card: [CARD_1], CVV: [CVV_1], Exp: [DATE_1]"

Network & System Information

IP Addresses:

IPv4 (192.168.1.1)
IPv6 (2001:0db8:85a3::8a2e:0370:7334)
Private IPs (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)

MAC Addresses:

Standard format (00:1B:44:11:3A:B7)
Cisco format (001B.4411.3AB7)
Windows format (00-1B-44-11-3A-B7)

Example:

Input: "Server IP: 192.168.1.100, MAC: 00:1B:44:11:3A:B7"
Redacted: "Server IP: [IP_1], MAC: [MAC_1]"

Temporal Information

Dates:

Birth dates
Event dates
Timestamps

Ages:

Exact ages
Age ranges (if specific)

Example:

Input: "DOB: 01/15/1990, Age: 35"
Redacted: "DOB: [DATE_1], Age: [AGE_1]"

Configuration

Enable PII Redaction

Per-Server Configuration:

{
  "server_name": "customer_service_server",
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": true  // Enable automatic redaction
    },
    "block": []  // Don't block on PII, just redact
  },
  "output_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": false  // De-anonymization happens automatically
    }
  }
}

Block on PII Detection (Optional)

You can also block requests that contain PII instead of redacting:

{
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": false  // Don't redact, just detect
    },
    "block": ["pii"]  // Block if PII is detected
  }
}

Use Case: Prevent users from accidentally including PII in public-facing tools.

Custom PII Entities

Configure which PII types to detect:

{
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": true,
      "pii_entities": [
        "EMAIL",
        "PHONE",
        "SSN",
        "CREDIT_CARD"
      ]
    }
  }
}

PII Mapping Security

Secure Storage

Ephemeral Mappings: PII mappings are stored only for the duration of the request/response cycle

The PII mapping is:

Generated server-side by Enkrypt API
Associated with a unique session key
Never logged or persisted
Automatically expires after use
Encrypted in transit (HTTPS)

Mapping Key Structure

# PII Mapping Example
{
  "key": "a1b2c3d4e5f6g7h8i9j0",  # Unique session key
  "mappings": {
    "NAME_1": "<encrypted_value>",
    "EMAIL_1": "<encrypted_value>",
    "PHONE_1": "<encrypted_value>"
  }
}

Security Properties:

Keys are cryptographically random (160+ bits of entropy)
Mappings are never exposed in logs
Keys cannot be reused across sessions
Server-side storage is encrypted at rest

Advanced Use Cases

Scenario 1: Customer Support

Goal: Protect customer PII when using AI tools

{
  "server_name": "customer_support_ai",
  "description": "AI assistant for customer support",
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": true
    }
  }
}

Flow:

Agent: "Customer John Doe called about order #12345, email [email protected]"
  → AI sees: "Customer [NAME_1] called about order #12345, email [EMAIL_1]"
  → AI responds: "I've looked up [EMAIL_1]'s order #12345..."
  → Agent sees: "I've looked up [email protected]'s order #12345..."

Scenario 2: Data Analysis

Goal: Analyze customer data without exposing PII

{
  "server_name": "analytics_server",
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": true
    }
  }
}

Flow:

Analyst: "Analyze feedback from [email protected], [email protected]"
  → Analytics sees: "Analyze feedback from [EMAIL_1], [EMAIL_2]"
  → Analytics returns: "[EMAIL_1] satisfaction: 85%, [EMAIL_2] satisfaction: 92%"
  → Analyst sees: "[email protected] satisfaction: 85%, [email protected] satisfaction: 92%"

Goal: Minimize PII exposure for compliance

{
  "server_name": "gdpr_compliant_server",
  "input_guardrails_policy": {
    "enabled": true,
    "policy_name": "GDPR Compliance Policy",
    "additional_config": {
      "pii_redaction": true,
      "pii_entities": ["EMAIL", "PHONE", "SSN", "NAME", "ADDRESS"]
    },
    "block": ["pii"]  // Optional: block instead of redact for strict compliance
  }
}

Limitations & Best Practices

Detection Accuracy

Not 100% Accurate: PII detection uses ML models with ~95-98% accuracyFalse Positives:

Generic names (“John Smith” in documentation)
Example emails ([email protected])
Sample phone numbers (555-0100)

False Negatives:

Obfuscated PII ([email protected] as “j dot doe at example”)
Non-standard formats
Context-dependent PII

Best Practice:

Test with sample data before production
Review redaction results periodically
Use additional guardrails (keyword detection) for critical PII

Performance Impact

Latency: PII redaction adds 50-150ms per requestOptimization:

Enable only for servers handling user data
Use selective entity types (don’t detect all PII if unnecessary)
Cache redaction results for repeated inputs

Monitoring:

# Check PII redaction metrics
secure-mcp-gateway metrics --filter pii

# Output
pii_redactions_total: 1234
pii_redaction_latency_avg: 87ms
pii_restoration_latency_avg: 45ms

Context Preservation

Challenge: Redaction may break context for AI understandingExample:

Original: "Send meeting invite to [email protected] and [email protected]"
Redacted: "Send meeting invite to [EMAIL_1] and [EMAIL_2]"

AI loses context that both emails are from same company

Mitigation:

Use partial redaction for non-sensitive patterns
Provide domain whitelist (e.g., allow @company.com)
Include metadata hints (e.g., “[EMAIL_1 from company.com]”)

Token Persistence

Problem: Tokens don’t persist across sessionsExample:

Session 1:
  Input: "Email [email protected]"
  Redacted: "Email [EMAIL_1]"
  
Session 2:
  Input: "Email [email protected]"  
  Redacted: "Email [EMAIL_2]"  // Different token!

Why: Each session gets a unique PII key for securityBest Practice: If consistency needed, use custom identifiers instead of PII

Testing PII Redaction

Manual Testing

# Test with sample PII
echo '{"text": "Contact John Doe at [email protected] or 555-123-4567"}' | \
  curl -X POST https://api.enkryptai.com/guardrails/pii \
    -H "apikey: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d @-

# Response
{
  "text": "Contact [NAME_1] at [EMAIL_1] or [PHONE_1]",
  "key": "abc123xyz789"
}

Integration Testing

# Test PII redaction in gateway
import asyncio
from secure_mcp_gateway.plugins.guardrails.enkrypt_provider import EnkryptPIIHandler

async def test_pii():
    handler = EnkryptPIIHandler(
        api_key="YOUR_API_KEY",
        base_url="https://api.enkryptai.com"
    )
    
    # Test redaction
    text = "Email me at [email protected]"
    redacted, mapping = await handler.redact_pii(text)
    
    print(f"Original: {text}")
    print(f"Redacted: {redacted}")
    print(f"Mapping Key: {mapping['key']}")
    
    # Test restoration
    response = "Message sent to [EMAIL_1]"
    restored = await handler.restore_pii(response, mapping)
    
    print(f"Response: {response}")
    print(f"Restored: {restored}")

asyncio.run(test_pii())

Automated Testing

Use the included test suite:

# Run PII tests
pytest tests/test_pii_handling.py -v

# Test with sample data
pytest tests/test_pii_handling.py::test_redact_email
pytest tests/test_pii_handling.py::test_redact_phone
pytest tests/test_pii_handling.py::test_restore_pii

Monitoring & Metrics

PII Metrics

Available Metrics:

pii_redactions_total - Total PII redaction operations
pii_detections_by_type - PII detections by entity type (EMAIL, PHONE, etc.)
pii_redaction_latency - Time to redact PII
pii_restoration_latency - Time to restore PII
pii_failures_total - Failed PII operations

Grafana Dashboard: The gateway includes a PII monitoring dashboard showing:

Redaction rate over time
PII types detected
Latency percentiles (p50, p95, p99)
Error rates

Logging

PII events are logged (with PII values masked):

{
  "event": "pii_redacted",
  "timestamp": "2025-01-15T10:30:00Z",
  "server_name": "customer_support",
  "entities_detected": ["EMAIL", "PHONE"],
  "entity_count": 2,
  "original_length": 150,
  "redacted_length": 135,
  "pii_key": "****xyz",  // Last 3 chars only
  "latency_ms": 87
}

Next Steps

Security Testing

Test PII redaction with attack scenarios

Guardrail Types

Learn about other guardrail types

Configuration

Configure PII redaction for your servers

Compliance

GDPR/CCPA compliance guide

Get Started

Core Concepts

Features

Deployment

Client Integration

Observability

Security

Guides

Overview

How PII Redaction Works

Complete Flow

Step-by-Step Process

Supported PII Types

Configuration

Enable PII Redaction

Block on PII Detection (Optional)

Custom PII Entities

PII Mapping Security

Secure Storage

Mapping Key Structure

Advanced Use Cases

Scenario 1: Customer Support

Scenario 2: Data Analysis

Limitations & Best Practices

Testing PII Redaction

Manual Testing

Integration Testing

Automated Testing

Monitoring & Metrics

PII Metrics

Logging

Next Steps

Security Testing

Guardrail Types

Configuration

Compliance

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Deployment

Client Integration

Observability

Security

Guides

​Overview

​How PII Redaction Works

​Complete Flow

​Step-by-Step Process

​Supported PII Types

​Configuration

​Enable PII Redaction

​Block on PII Detection (Optional)

​Custom PII Entities

​PII Mapping Security

​Secure Storage

​Mapping Key Structure

​Advanced Use Cases

​Scenario 1: Customer Support

​Scenario 2: Data Analysis

​Scenario 3: Compliance (GDPR/CCPA)

​Limitations & Best Practices

​Testing PII Redaction

​Manual Testing

​Integration Testing

​Automated Testing

​Monitoring & Metrics

​PII Metrics

​Logging

​Next Steps

Security Testing

Guardrail Types

Configuration

Compliance

Build docs developers (and LLMs) love

Overview

How PII Redaction Works

Complete Flow

Step-by-Step Process

Supported PII Types

Configuration

Enable PII Redaction

Block on PII Detection (Optional)

Custom PII Entities

PII Mapping Security

Secure Storage

Mapping Key Structure

Advanced Use Cases

Scenario 1: Customer Support

Scenario 2: Data Analysis

Scenario 3: Compliance (GDPR/CCPA)

Limitations & Best Practices

Testing PII Redaction

Manual Testing

Integration Testing

Automated Testing

Monitoring & Metrics

PII Metrics

Logging

Next Steps