Skip to main content
Yggdrasil supports custom PDF upload to extract structured, enforceable compliance rules from any regulatory document — including industry-specific regulations (HIPAA, PCI-DSS, GLBA) or internal company policies.

How It Works

The custom PDF extraction pipeline uses Gemini 2.5 Flash to parse regulatory text and generate rules with compound boolean logic.
1

Upload PDF

Upload your regulatory document via the audit wizard. Supported formats:
  • Standard PDFs (text-based)
  • Scanned PDFs (OCR is applied automatically)
  • Maximum file size: 10MB
  • Maximum pages: 100
2

Text Extraction

Yggdrasil uses unpdf (serverless-compatible PDF parser) to extract text from the document.Text is chunked into sections to fit within Gemini’s context window.
3

AI Rule Extraction

Gemini 2.5 Flash analyzes the document and identifies:
  • Enforceable clauses — Statements that can be validated against data
  • Thresholds — Numeric limits (e.g., “transactions exceeding $10,000”)
  • Conditions — Boolean logic (e.g., “if amount > $10K AND type = WIRE”)
  • Severity — Risk level (CRITICAL, HIGH, MEDIUM)
  • Policy excerpts — Exact quotes from the document
The AI generates structured rules in the following format:
{
  "rule_id": "CUSTOM-001",
  "name": "High-Value Wire Transfer",
  "type": "single_transaction",
  "severity": "HIGH",
  "threshold": 50000,
  "conditions": {
    "AND": [
      { "field": "type", "operator": "==", "value": "WIRE" },
      { "field": "amount", "operator": ">", "value": 50000 }
    ]
  },
  "policy_excerpt": "Wire transfers exceeding $50,000 must be reviewed.",
  "policy_section": "Section 3.2"
}
4

Signal Specificity Validation

Each extracted rule is scored using the Signal Specificity Framework:
  • Single-signal rules (e.g., “amount > $10K”) are rejected unless they meet domain-specific thresholds
  • Multi-signal rules (e.g., “amount > $10K AND type = WIRE AND dest_country = offshore”) are accepted
  • Minimum combined specificity: 2.0
This filtering step minimizes false positives by ensuring rules combine multiple signals.
5

Rule Review & Activation

All extracted rules are displayed for review:
  • Toggle individual rules on/off
  • Edit thresholds or conditions (advanced)
  • View policy excerpts and severity assignments
Only active rules are executed during the scan.

Signal Specificity Framework

The Signal Specificity Framework is a scoring system that evaluates rule quality based on how many independent signals are combined.

Signal Types

Signal TypeExamplesSpecificity Score
BehavioralTransaction type, action, event1.0 per signal
TemporalTime window, frequency, velocity1.0 per signal
RelationalAccount relationships, cross-entity patterns1.0 per signal
ThresholdNumeric limits (amount, count, age)0.5 per threshold

Scoring Rules

  • Single condition: Score = specificity of that signal
  • AND conditions: Score = sum of all signal specificities
  • OR conditions: Score = max specificity among branches

Examples

Rule:
{ "field": "amount", "operator": ">", "value": 10000 }
Specificity Score: 0.5 (threshold only)Result: Rejected (below minimum threshold of 2.0)Reason: This rule would fire on any transaction over $10K, producing too many false positives.
Rule:
{
  "AND": [
    { "field": "amount", "operator": ">", "value": 10000 },
    { "field": "type", "operator": "==", "value": "WIRE" }
  ]
}
Specificity Score: 0.5 (threshold) + 1.0 (behavioral) = 1.5Result: Borderline (may be rejected depending on domain)Reason: Still too broad — any wire transfer over $10K would trigger.
Rule:
{
  "AND": [
    { "field": "amount", "operator": ">", "value": 10000 },
    { "field": "type", "operator": "==", "value": "WIRE" },
    { "field": "dest_country", "operator": "IN", "value": ["offshore_jurisdictions"] }
  ]
}
Specificity Score: 0.5 (threshold) + 1.0 (behavioral) + 1.0 (relational) = 2.5Result: AcceptedReason: Combines multiple signals (amount + type + destination), reducing false positives.
Rule (Velocity-based):
{
  "type": "velocity",
  "threshold": 5,
  "time_window": 24,
  "conditions": { "field": "amount", "operator": ">", "value": 8000 }
}
Specificity Score: 0.5 (threshold) + 1.0 (temporal: 24h window) + 1.0 (behavioral: velocity pattern) = 2.5Result: AcceptedReason: Velocity rules inherently combine temporal and behavioral signals.

Supported Rule Types

The extraction engine can generate the following rule types:
Rule TypeDescriptionExample
single_transactionEvaluate conditions per record”Flag transactions > $10K”
aggregationSum values within time window”Flag accounts with total volume > $25K in 24h”
velocityCount occurrences within time window”Flag 5+ transactions in 24h”
structuringDetect sub-threshold patterns”Flag 3+ transactions between 8K8K-10K in 24h”
dormant_reactivationDetect dormant account activity”Flag dormant accounts (90d) with transaction > $5K”
round_amountDetect round-dollar patterns”Flag 3+ round amounts ($X,000) in 30d”

What Gets Extracted

For each rule, Gemini extracts:
  • rule_id — Unique identifier (e.g., CUSTOM-001)
  • name — Human-readable name (e.g., “High-Value Wire Transfer”)
  • type — Rule execution type (single_transaction, aggregation, velocity, etc.)
  • severity — Risk level (CRITICAL, HIGH, MEDIUM)
  • threshold — Numeric limit (e.g., 10000 for $10K)
  • time_window — Hours for temporal rules (e.g., 24 for “within 24 hours”)
  • AND — All conditions must match
  • OR — Any condition must match
  • Leaf conditions — Field, operator, value triples
Example:
{
  "AND": [
    { "field": "amount", "operator": ">=", "value": 10000 },
    { "field": "type", "operator": "IN", "value": ["WIRE", "TRANSFER"] }
  ]
}
  • policy_excerpt — Exact quote from the PDF
  • policy_section — Section/article reference (e.g., “Section 3.2”, “Article 15”)

Quality Assurance

To ensure high-quality rule extraction:
1

Validation Against Schema

All extracted rules are validated against Zod schemas to ensure:
  • Valid JSON structure
  • Required fields present (rule_id, name, type, severity, conditions)
  • Supported operators (>=, ==, IN, BETWEEN, etc.)
2

Specificity Scoring

Rules are scored using the Signal Specificity Framework:
  • Minimum combined specificity: 2.0
  • Single-threshold rules rejected
  • Compound conditions preferred
3

Human Review

All extracted rules are displayed for review before execution:
  • View policy excerpts
  • Toggle rules on/off
  • Edit thresholds (advanced users)

Example: HIPAA Privacy Rule Extraction

Section 164.502(a) — Uses and Disclosures of Protected Health Information

(1) A covered entity may not use or disclose protected health information, 
except as permitted or required by this subpart.

(2) A covered entity must obtain an individual's authorization for any use 
or disclosure of PHI that is not for treatment, payment, or healthcare operations.
Specificity Score: 1.0 (behavioral: PHI) + 1.0 (behavioral: purpose check) + 1.0 (behavioral: authorization) = 3.0

Limitations

  • Ambiguous text: If policy language is vague (e.g., “reasonable measures”), Gemini may skip extraction
  • Non-enforceable clauses: Aspirational statements (e.g., “strive to protect”) are not converted to rules
  • Complex logic: Nested OR conditions with 5+ branches may be simplified or split into multiple rules
  • Context window: PDFs over 100 pages may be truncated; consider uploading specific sections

When to Use Custom PDF vs Prebuilt

ScenarioRecommended Approach
You need AML, GDPR, or SOC2 complianceUse prebuilt frameworks (faster, includes historical fines)
You have HIPAA, PCI-DSS, GLBA, or other industry regulationsUse custom PDF upload
You’re enforcing internal company policiesUse custom PDF upload
You need to customize rule thresholdsUse prebuilt + manual editing or custom PDF
You want to test a new regulation before productionUse custom PDF with a sample dataset

Next Steps

Start an Audit

Upload your first PDF and extract rules

Rule Engine

Learn how rules are evaluated

Confidence Scoring

Understand how violations are scored

Explainability

See how violation explanations are generated

Build docs developers (and LLMs) love