Skip to main content

Overview

The pii_findings table stores PII (Personally Identifiable Information) detection results from uploaded datasets. Findings are initially linked by upload_id, then associated with a scan_id after scan completion.

Schema

ColumnTypeDescription
idUUID, PKUnique finding identifier
scan_idUUID, FK, nullableLinked after scan creation
upload_idUUIDUpload that was scanned for PII
column_nametextColumn containing PII
pii_typetextType of PII detected (see below)
severitytext'CRITICAL', 'HIGH', 'MEDIUM'
confidencenumericDetection confidence (0–1)
match_countintegerRows with potential PII
total_rowsintegerTotal rows analyzed
sample_valuesjsonbMasked sample values
detection_querytextRegex pattern used
violation_texttextPII risk description
suggestiontextRemediation: 'hash', 'encrypt', 'remove'
statustext'open', 'resolved', 'ignored'
created_attimestamptzTimestamp when finding was detected
resolved_attimestamptzTimestamp when resolved
resolved_byUUIDUser who resolved the finding

PII Types Detected

The pii_type field identifies the category of PII found:
PII TypeDescriptionExample Pattern
emailEmail addresses[email protected]
phonePhone numbers(555) 123-4567
ssnSocial Security Numbers123-45-6789
namePerson namesJohn Doe
addressPhysical addresses123 Main St, City, ST 12345
credit_cardCredit card numbers4532-1234-5678-9010
ip_addressIP addresses192.168.1.1
date_of_birthDates of birth1990-01-01
passportPassport numbersAB1234567
national_idNational ID numbersCountry-specific formats
bank_accountBank account numbersAccount number formats

Upload ID vs Scan ID Linking

Why Two IDs?PII detection runs at upload time (before a scan exists), so findings are initially stored with only upload_id. After a scan is created, the findings are linked to scan_id for persistent reference.

Linking Flow

  1. Upload Data → CSV uploaded, assigned upload_id
  2. Scan for PII → Findings created with upload_id, scan_id = null
  3. Create Scan → Scan created with id = scan_id
  4. Link Findings → Update findings: SET scan_id = scan_id WHERE upload_id = upload_id

Sample Values JSONB

The sample_values field stores masked examples of detected PII.

Example Sample Values

[
  "j***@example.com",
  "s***@company.org",
  "a***@domain.net"
]
Sample values are:
  • Limited to 3-5 examples to avoid exposing sensitive data
  • Masked to show pattern while protecting privacy
  • Used in UI to help users confirm detection accuracy

Remediation Suggestions

The suggestion field provides a recommended remediation action:
SuggestionWhen to UseDescription
hashSSN, credit cards, account numbersOne-way hash (SHA-256) for lookup without reversibility
encryptNames, addresses, DOBReversible encryption for authorized access
removeEmail (if not needed), phoneDelete column if not required for compliance scan
PII Detection is InformationalPII findings are surfaced as warnings, but the scan proceeds regardless. The user is informed of potential PII exposure and can take remediation action before running the compliance scan.

Detection Method

PII detection is regex-based and samples 20 rows from each column. This provides:
  • Fast detection without scanning entire dataset
  • Low false positive rate due to pattern-based matching
  • High recall for common PII formats (email, phone, SSN, credit card)

Example Detection Query

# Email detection
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# SSN detection
\b\d{3}-\d{2}-\d{4}\b

# Credit card detection
\b(?:\d{4}[- ]?){3}\d{4}\b

Status Lifecycle

StatusDescription
openPII detected, awaiting action
resolvedUser has taken remediation action (hash, encrypt, remove)
ignoredUser acknowledges but chooses not to remediate

Relationships

  • Foreign Key to scans (nullable) — Linked after scan creation
  • References upload_id — References in-memory upload store
  • Foreign Key to auth.users (via resolved_by) — Tracks who resolved the finding

Example Query

Get All Open PII Findings for a Scan

SELECT 
  column_name,
  pii_type,
  severity,
  match_count,
  total_rows,
  suggestion
FROM pii_findings
WHERE scan_id = 'scan-uuid'
  AND status = 'open'
ORDER BY severity DESC, match_count DESC;

Build docs developers (and LLMs) love