Skip to main content
Data Loss Prevention (DLP) scanning in AIP detects and blocks sensitive data from leaving through agent responses. This guide shows you how to configure DLP patterns and egress controls.

Overview

AIP’s DLP engine scans tool responses for sensitive patterns like:
  • API keys and secrets
  • Credit card numbers
  • Social Security Numbers
  • Email addresses
  • IP addresses
  • Custom sensitive data patterns

Configuration

Configure DLP in your policy’s dlp section:
apiVersion: aip.io/v1alpha1
kind: AgentPolicy
metadata:
  name: dlp-example
spec:
  mode: enforce
  allowed_tools:
    - read_file
  dlp:
    patterns:
      - name: "AWS Access Key"
        regex: "AKIA[A-Z0-9]{16}"
      - name: "Credit Card"
        regex: "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b"

Pattern Structure

dlp.patterns
array
Array of regex patterns to scan for in tool responses.
dlp.patterns[].name
string
required
Human-readable name for the pattern. Used in audit logs.
dlp.patterns[].regex
string
required
Regular expression to match sensitive data. Must be valid RE2 syntax.

Built-In Pattern Library

Cloud Provider Secrets

dlp:
  patterns:
    # AWS Access Key
    - name: "AWS Access Key"
      regex: "AKIA[A-Z0-9]{16}"
    
    # AWS Secret Key
    - name: "AWS Secret Key"
      regex: "[A-Za-z0-9/+=]{40}"
    
    # Google Cloud API Key
    - name: "GCP API Key"
      regex: "AIza[0-9A-Za-z\\-_]{35}"
    
    # GitHub Token
    - name: "GitHub Token"
      regex: "ghp_[0-9a-zA-Z]{36}"
    
    # Azure Storage Key
    - name: "Azure Storage Key"
      regex: "DefaultEndpointsProtocol=https;AccountName=[^;]+;AccountKey=[^;]+"

Financial Data

dlp:
  patterns:
    # Credit Card (Luhn-aware regex)
    - name: "Credit Card"
      regex: "\\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\\b"
    
    # SSN
    - name: "Social Security Number"
      regex: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
    
    # IBAN
    - name: "IBAN"
      regex: "[A-Z]{2}\\d{2}[A-Z0-9]{4}\\d{7}([A-Z0-9]?){0,16}"

Personal Identifiable Information (PII)

dlp:
  patterns:
    # Email addresses
    - name: "Email Address"
      regex: "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b"
    
    # Phone numbers (US)
    - name: "Phone Number"
      regex: "\\b(?:\\+1[-.\\s]?)?\\(?([0-9]{3})\\)?[-.\\s]?([0-9]{3})[-.\\s]?([0-9]{4})\\b"
    
    # IP addresses
    - name: "IP Address"
      regex: "\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b"

How DLP Scanning Works

1

Agent calls a tool

The agent requests data:
{
  "method": "tools/call",
  "params": {
    "name": "read_file",
    "arguments": {"path": "/home/user/.env"}
  }
}
2

AIP forwards the request

After policy checks pass, the request goes to the MCP server.
3

MCP server returns response

The server returns the file contents:
{
  "result": {
    "content": "AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE\\nAWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  }
}
4

DLP engine scans the response

AIP scans the response content against all configured patterns.Match found: AKIAIOSFODNN7EXAMPLE matches AKIA[A-Z0-9]{16}
5

Sensitive data is redacted

The response is modified:
{
  "result": {
    "content": "AWS_ACCESS_KEY_ID=[REDACTED: AWS Access Key]\\nAWS_SECRET_ACCESS_KEY=[REDACTED: AWS Secret Key]"
  }
}
The original match is logged to the audit trail.

v1alpha2: Enhanced DLP

v1alpha2 adds request-side DLP scanning:
apiVersion: aip.io/v1alpha2
kind: AgentPolicy
spec:
  dlp:
    scan_requests: true   # Scan arguments, not just responses
    scan_responses: true  # Scan tool outputs (default)
    patterns:
      - name: "AWS Key"
        regex: "AKIA[A-Z0-9]{16}"
Use case: Prevent agents from echoing secrets back in parameters.

Custom Patterns

Example: Internal URLs

dlp:
  patterns:
    - name: "Internal URL"
      regex: "https?://[a-z0-9-]+\\.internal\\.company\\.com"

Example: Database Connection Strings

dlp:
  patterns:
    - name: "Postgres Connection String"
      regex: "postgresql://[^@]+@[^/]+/[^\\s]+"

Example: API Tokens

dlp:
  patterns:
    - name: "Bearer Token"
      regex: "Bearer [A-Za-z0-9\\-._~+/]+=*"

Performance Considerations

Regex Complexity

Avoid catastrophic backtracking patterns:Bad: "(a+)+b" - exponential time Good: "^[a-z]+$" - linear time (RE2)
AIP uses RE2 for guaranteed linear-time regex evaluation.

Scanning Overhead

Typical DLP scanning adds:
  • Less than 5ms for responses under 10KB
  • Less than 50ms for responses under 100KB
  • Scales linearly with response size

Optimizing Performance

# Option 1: Limit patterns to high-risk tools only
tool_rules:
  - tool: read_file
    action: allow
    dlp:
      patterns:  # Tool-specific DLP (v1alpha2)
        - name: "AWS Key"
          regex: "AKIA[A-Z0-9]{16}"

# Option 2: Use more specific patterns
dlp:
  patterns:
    - name: "AWS Key in env file"
      regex: "AWS_ACCESS_KEY_ID=AKIA[A-Z0-9]{16}"  # More specific = faster

Redaction Strategies

Default: Full Redaction

AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
                  ^^^^^^^^^^^^^^^^^^^^^^
                  [REDACTED: AWS Access Key]

Partial Masking (v1alpha2)

dlp:
  patterns:
    - name: "Credit Card"
      regex: "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b"
      redaction: partial  # Show last 4 digits
Output:
Card: 1234-5678-9012-3456
      ****-****-****-3456

Audit Logging

Every DLP match is logged:
{
  "timestamp": "2026-03-03T17:00:00Z",
  "event_type": "dlp_match",
  "tool": "read_file",
  "pattern": "AWS Access Key",
  "match_count": 1,
  "redacted": true,
  "arguments": {"path": "/home/user/.env"}
}
Query DLP violations:
cat aip-audit.jsonl | jq 'select(.event_type == "dlp_match")'

Compliance Use Cases

SOC 2 Type II

Demonstrate technical controls for data access:
dlp:
  patterns:
    - name: "PII - Email"
      regex: "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b"
    - name: "PII - Phone"
      regex: "\\b\\d{3}-\\d{3}-\\d{4}\\b"
Audit logs prove data was redacted before leaving the system.

GDPR Article 32

Technical measures to ensure data security:
dlp:
  patterns:
    - name: "Personal Data - EU ID"
      regex: "[A-Z]{2}\\d{9}"

HIPAA

Protect Protected Health Information (PHI):
dlp:
  patterns:
    - name: "Medical Record Number"
      regex: "MRN-\\d{8}"
    - name: "Prescription Number"
      regex: "Rx#\\d{10}"

Testing DLP Patterns

Validate Regex

Test patterns before deploying:
# Use grep to test regex
echo "AKIAIOSFODNN7EXAMPLE" | grep -E "AKIA[A-Z0-9]{16}"

Monitor Mode Testing

Test DLP without blocking:
spec:
  mode: monitor  # Log matches but don't redact
  dlp:
    patterns:
      - name: "Test Pattern"
        regex: "..."
Review logs to validate patterns before enforcing.

Troubleshooting

Common issues:
  • Escaping: Use double backslashes in YAML: "\\d" not "\d"
  • Anchors: Patterns are searched, not matched. Don’t use ^ or $ unless needed.
  • Case sensitivity: Patterns are case-sensitive by default.
Test with:
echo "test string" | grep -E "your-pattern"
If DLP scanning is slow:
  • Reduce number of patterns
  • Simplify regex complexity
  • Use tool-specific DLP (v1alpha2)
  • Increase response size limits if hitting truncation

Next Steps

Writing Policies

Integrate DLP into your policies

Audit Logging

Review DLP violations

Policy Schema

Complete DLP configuration reference

Error Codes

Understand DLP-related errors

Build docs developers (and LLMs) love