Why PII Detection?
PII detection helps you:- Identify sensitive data columns before processing
- Apply appropriate safeguards (hashing, encryption, removal)
- Comply with data minimization principles (GDPR Article 5)
- Avoid accidentally exposing PII in violation evidence
PII detection is advisory only — the scan proceeds regardless of findings, but you’ll be warned about sensitive data.
Detected PII Types
Yggdrasil scans for these PII categories using regex patterns:Personal Identifiers
- Email addresses:
[email protected] - Phone numbers: US and international formats
- Social Security Numbers (SSN):
123-45-6789 - Names: First/last name patterns
- Physical addresses: Street addresses
- Dates of birth: Various date formats
Financial Data
- Credit card numbers: 16-digit card patterns (Visa, MC, Amex)
- Bank account numbers: Common account formats
Government IDs
- Passport numbers: International passport formats
- National ID numbers: Country-specific formats
- Driver’s license numbers: US state formats
Technical Identifiers
- IP addresses: IPv4 and IPv6
- MAC addresses: Network hardware identifiers
Detection Process
Trigger PII scan
After uploading your CSV, click “Scan for PII” before proceeding to mapping confirmation.
Sampling
The system analyzes up to 20 sample rows per column to detect PII patterns without scanning the entire dataset.
Pattern matching
Each column is tested against PII regex patterns. Matches are masked for safe display:
- Emails:
u***@example.com - SSNs:
***-**-1234 - Credit cards:
****-****-****-1234 - Phones:
***-***-1234
Severity Levels
| Severity | PII Types | Risk |
|---|---|---|
| CRITICAL | SSN, credit card, passport, national ID | Immediate regulatory concern (GDPR Art. 9) |
| HIGH | Email, phone, address, date of birth, bank account | Regulated personal data (GDPR Art. 4) |
| MEDIUM | Name, IP address, MAC address | Identifiers requiring protection |
Confidence Scoring
Confidence indicates detection accuracy:- 90-100%: Strong pattern match (e.g., email regex)
- 70-89%: Likely PII (e.g., name patterns)
- 60-69%: Possible PII (e.g., generic number patterns)
- < 60%: Not reported (too uncertain)
Detection Output
Example PII finding:Remediation Suggestions
For each PII type, Yggdrasil suggests:- hash: One-way hash for pseudonymization (emails, account numbers)
- encrypt: Two-way encryption for reversible protection (credit cards, SSNs)
- remove: Delete the column if not needed for compliance checks
Yggdrasil does not automatically modify your data. Suggestions are advisory — you must apply them manually before uploading.
False Positives
Regex-based detection may produce false positives:- IP addresses detected in non-IP columns (e.g., version numbers like
1.2.3.4) - Phone numbers detected in numeric IDs
- Credit card patterns in transaction IDs
- High match % + high confidence: Likely true positive
- Low match % + medium confidence: Possibly false positive
What Happens with PII Findings?
PII findings are:- Stored in the
pii_findingstable withupload_id - Linked to the scan via
scan_idafter scan completion - Surfaced as warnings in the UI
- Not enforced: The scan proceeds even if PII is detected
Detection Patterns
Yggdrasil uses these fallback regex patterns:Phone (US/International)
SSN
Credit Card
IP Address (IPv4)
Disabling PII Detection
If you don’t need PII scanning:- Skip the “Scan for PII” step
- Proceed directly to column mapping confirmation
- No PII findings will be stored
Next Steps
After reviewing PII findings:- Apply remediation (hash/encrypt/remove columns) if needed
- Re-upload the sanitized dataset
- Confirm column mappings → Column Mapping
- Run the compliance scan → Compliance Scanning