Data Loss Prevention (DLP) scanning in AIP detects and blocks sensitive data from leaving through agent responses. This guide shows you how to configure DLP patterns and egress controls.
Overview
AIP’s DLP engine scans tool responses for sensitive patterns like:
API keys and secrets
Credit card numbers
Social Security Numbers
Email addresses
IP addresses
Custom sensitive data patterns
Configuration
Configure DLP in your policy’s dlp section:
apiVersion : aip.io/v1alpha1
kind : AgentPolicy
metadata :
name : dlp-example
spec :
mode : enforce
allowed_tools :
- read_file
dlp :
patterns :
- name : "AWS Access Key"
regex : "AKIA[A-Z0-9]{16}"
- name : "Credit Card"
regex : " \\ b \\ d{4}[- ]? \\ d{4}[- ]? \\ d{4}[- ]? \\ d{4} \\ b"
Pattern Structure
Array of regex patterns to scan for in tool responses.
Human-readable name for the pattern. Used in audit logs.
Regular expression to match sensitive data. Must be valid RE2 syntax.
Built-In Pattern Library
Cloud Provider Secrets
dlp :
patterns :
# AWS Access Key
- name : "AWS Access Key"
regex : "AKIA[A-Z0-9]{16}"
# AWS Secret Key
- name : "AWS Secret Key"
regex : "[A-Za-z0-9/+=]{40}"
# Google Cloud API Key
- name : "GCP API Key"
regex : "AIza[0-9A-Za-z \\ -_]{35}"
# GitHub Token
- name : "GitHub Token"
regex : "ghp_[0-9a-zA-Z]{36}"
# Azure Storage Key
- name : "Azure Storage Key"
regex : "DefaultEndpointsProtocol=https;AccountName=[^;]+;AccountKey=[^;]+"
Financial Data
dlp :
patterns :
# Credit Card (Luhn-aware regex)
- name : "Credit Card"
regex : " \\ b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}) \\ b"
# SSN
- name : "Social Security Number"
regex : " \\ b \\ d{3}- \\ d{2}- \\ d{4} \\ b"
# IBAN
- name : "IBAN"
regex : "[A-Z]{2} \\ d{2}[A-Z0-9]{4} \\ d{7}([A-Z0-9]?){0,16}"
dlp :
patterns :
# Email addresses
- name : "Email Address"
regex : " \\ b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+ \\ .[A-Z|a-z]{2,} \\ b"
# Phone numbers (US)
- name : "Phone Number"
regex : " \\ b(?: \\ +1[-. \\ s]?)? \\ (?([0-9]{3}) \\ )?[-. \\ s]?([0-9]{3})[-. \\ s]?([0-9]{4}) \\ b"
# IP addresses
- name : "IP Address"
regex : " \\ b(?:[0-9]{1,3} \\ .){3}[0-9]{1,3} \\ b"
How DLP Scanning Works
Agent calls a tool
The agent requests data: {
"method" : "tools/call" ,
"params" : {
"name" : "read_file" ,
"arguments" : { "path" : "/home/user/.env" }
}
}
AIP forwards the request
After policy checks pass, the request goes to the MCP server.
MCP server returns response
The server returns the file contents: {
"result" : {
"content" : "AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \\ nAWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}
DLP engine scans the response
AIP scans the response content against all configured patterns. Match found : AKIAIOSFODNN7EXAMPLE matches AKIA[A-Z0-9]{16}
Sensitive data is redacted
The response is modified: {
"result" : {
"content" : "AWS_ACCESS_KEY_ID=[REDACTED: AWS Access Key] \\ nAWS_SECRET_ACCESS_KEY=[REDACTED: AWS Secret Key]"
}
}
The original match is logged to the audit trail.
v1alpha2: Enhanced DLP
v1alpha2 adds request-side DLP scanning:
apiVersion : aip.io/v1alpha2
kind : AgentPolicy
spec :
dlp :
scan_requests : true # Scan arguments, not just responses
scan_responses : true # Scan tool outputs (default)
patterns :
- name : "AWS Key"
regex : "AKIA[A-Z0-9]{16}"
Use case : Prevent agents from echoing secrets back in parameters.
Custom Patterns
Example: Internal URLs
dlp :
patterns :
- name : "Internal URL"
regex : "https?://[a-z0-9-]+ \\ .internal \\ .company \\ .com"
Example: Database Connection Strings
dlp :
patterns :
- name : "Postgres Connection String"
regex : "postgresql://[^@]+@[^/]+/[^ \\ s]+"
Example: API Tokens
dlp :
patterns :
- name : "Bearer Token"
regex : "Bearer [A-Za-z0-9 \\ -._~+/]+=*"
Regex Complexity
Avoid catastrophic backtracking patterns: Bad : "(a+)+b" - exponential time
Good : "^[a-z]+$" - linear time (RE2)
AIP uses RE2 for guaranteed linear-time regex evaluation.
Scanning Overhead
Typical DLP scanning adds:
Less than 5ms for responses under 10KB
Less than 50ms for responses under 100KB
Scales linearly with response size
# Option 1: Limit patterns to high-risk tools only
tool_rules :
- tool : read_file
action : allow
dlp :
patterns : # Tool-specific DLP (v1alpha2)
- name : "AWS Key"
regex : "AKIA[A-Z0-9]{16}"
# Option 2: Use more specific patterns
dlp :
patterns :
- name : "AWS Key in env file"
regex : "AWS_ACCESS_KEY_ID=AKIA[A-Z0-9]{16}" # More specific = faster
Redaction Strategies
Default: Full Redaction
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
^^^^^^^^^^^^^^^^^^^^^^
[REDACTED: AWS Access Key]
Partial Masking (v1alpha2)
dlp :
patterns :
- name : "Credit Card"
regex : " \\ b \\ d{4}[- ]? \\ d{4}[- ]? \\ d{4}[- ]? \\ d{4} \\ b"
redaction : partial # Show last 4 digits
Output:
Card: 1234-5678-9012-3456
****-****-****-3456
Audit Logging
Every DLP match is logged:
{
"timestamp" : "2026-03-03T17:00:00Z" ,
"event_type" : "dlp_match" ,
"tool" : "read_file" ,
"pattern" : "AWS Access Key" ,
"match_count" : 1 ,
"redacted" : true ,
"arguments" : { "path" : "/home/user/.env" }
}
Query DLP violations:
cat aip-audit.jsonl | jq 'select(.event_type == "dlp_match")'
Compliance Use Cases
SOC 2 Type II
Demonstrate technical controls for data access:
dlp :
patterns :
- name : "PII - Email"
regex : " \\ b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+ \\ .[A-Z|a-z]{2,} \\ b"
- name : "PII - Phone"
regex : " \\ b \\ d{3}- \\ d{3}- \\ d{4} \\ b"
Audit logs prove data was redacted before leaving the system.
GDPR Article 32
Technical measures to ensure data security:
dlp :
patterns :
- name : "Personal Data - EU ID"
regex : "[A-Z]{2} \\ d{9}"
HIPAA
Protect Protected Health Information (PHI):
dlp :
patterns :
- name : "Medical Record Number"
regex : "MRN- \\ d{8}"
- name : "Prescription Number"
regex : "Rx# \\ d{10}"
Testing DLP Patterns
Validate Regex
Test patterns before deploying:
# Use grep to test regex
echo "AKIAIOSFODNN7EXAMPLE" | grep -E "AKIA[A-Z0-9]{16}"
Monitor Mode Testing
Test DLP without blocking:
spec :
mode : monitor # Log matches but don't redact
dlp :
patterns :
- name : "Test Pattern"
regex : "..."
Review logs to validate patterns before enforcing.
Troubleshooting
Common issues:
Escaping : Use double backslashes in YAML: "\\d" not "\d"
Anchors : Patterns are searched, not matched. Don’t use ^ or $ unless needed.
Case sensitivity : Patterns are case-sensitive by default.
Test with: echo "test string" | grep -E "your-pattern"
Next Steps
Writing Policies Integrate DLP into your policies
Audit Logging Review DLP violations
Policy Schema Complete DLP configuration reference
Error Codes Understand DLP-related errors