Overview
The quality controls module provides deterministic validation and filtering functions that run after LLM generation. These checks enforce required fields, normalize names, detect duplicates, flag suspicious outputs, and verify profile grounding.
Key Principles:
Deterministic : All checks are rule-based (no LLM calls)
Non-destructive : QC functions never raise exceptions, only drop invalid items and report
Actionable : Reports include flags and fix descriptions for debugging
Source: src/utils/quality_controls.py
from src.utils.quality_controls import run_extraction_qc
cleaned, report = run_extraction_qc(
entity_type = "people" ,
entities = [{ "name" : "Abdul Rahman Ahmed" , "nationality" : "Yemeni" }],
domain = "guantanamo" ,
min_name_len = 2
)
Run deterministic QC on a batch of extracted entities.
Entity type: "people", "organizations", "locations", or "events".
entities
List[Dict[str, Any]]
required
List of entity dictionaries from LLM extraction.
Domain name for loading schema and equivalence groups.
Minimum name length. Names shorter than this are flagged (but not dropped).
Validated and deduplicated entities.
Summary of QC actions (see below).
class ExtractionQCReport ( BaseModel ):
input_count: int
dropped_missing_required: int
deduped: int
output_count: int
flags: List[ str ]
fixes: Dict[ str , Any]
Number of entities before QC.
Entities dropped for missing required fields.
Duplicate entities removed (exact dedup + variant consolidation).
Number of entities after QC.
Issues detected:
zero_entities: No entities extracted
missing_required:{field}: Required field missing
short_name:{name}: Name shorter than min_name_len
high_drop_rate: >50% of entities dropped
many_duplicates: >50% of entities were duplicates
many_low_quality_names: Multiple generic/descriptive names
Corrections applied:
normalized_names: Count of names normalized
collapsed_variants: Count of name variants consolidated
QC Pipeline Steps
The extraction QC pipeline runs these checks in order:
Required Field Check
Drops entities missing required fields (derived from Pydantic schema). Required fields by type:
people: name
organizations: name
locations: name
events: title, description, event_type, start_date
Name Normalization
Normalizes names/titles:
Strip leading/trailing whitespace
Collapse multiple whitespace to single space
Unicode NFC normalization
Short Name Flagging
Flags (but does not drop) names shorter than min_name_len.
Exact Deduplication
Removes exact duplicates within the same article based on:
people: (name,)
organizations: (name, type)
locations: (name, type)
events: (title, start_date)
Variant Consolidation
Collapses name variants for organizations and locations:
Acronym matching (e.g., “ICRC” ↔ “International Committee of the Red Cross”)
Substring containment
Domain equivalence groups
Keeps the more canonical name (proper nouns over descriptions) and adds the other to aliases.
Low-Quality Name Detection
Flags if multiple entities have generic/descriptive names (e.g., “the camp”, “the facility”).
Example: Variant Consolidation
from src.utils.quality_controls import run_extraction_qc
entities = [
{ "name" : "International Committee of the Red Cross" , "type" : "humanitarian" },
{ "name" : "ICRC" , "type" : "humanitarian" },
{ "name" : "Red Cross" , "type" : "humanitarian" },
]
cleaned, report = run_extraction_qc(
entity_type = "organizations" ,
entities = entities,
domain = "guantanamo"
)
# Output:
# cleaned = [
# {
# "name": "International Committee of the Red Cross",
# "type": "humanitarian",
# "aliases": ["ICRC", "Red Cross"]
# }
# ]
# report.deduped = 2
# report.fixes = {"collapsed_variants": 2}
Entity Relevance Filtering
filter_entities_by_article_relevance
from src.utils.quality_controls import filter_entities_by_article_relevance
kept, report = filter_entities_by_article_relevance(
entity_type = "organizations" ,
entities = [
{ "name" : "Red Cross" , "aliases" : [ "ICRC" ]},
{ "name" : "Fictional Org" }, # Hallucinated
],
article_text = "The ICRC visited the camp in 2005." ,
domain = "guantanamo" ,
require_mention = True
)
# kept = [{"name": "Red Cross", "aliases": ["ICRC"]}]
# report.dropped = 1
Filter out entities whose names don’t appear in the source article text.
Entity type: "people", "organizations", "locations", or "events".
entities
List[Dict[str, Any]]
required
Entities to validate.
Full article text to search for mentions.
Domain name for loading equivalence groups.
Whether to enforce mention validation. If False, returns all entities.
Mention Detection Strategy
For each entity, builds a “needle set” from:
Canonical name
Aliases (from within-article dedup)
Computed acronym (for orgs/locations)
Domain equivalence group variants
If any needle appears in the article text (case-insensitive), the entity is kept.
Special handling for short needles (≤3 chars):
Uses word-boundary regex \b..\b to avoid false positives
Example: “UN” matches “The UN peacekeepers” but not “under”
Profile QC
run_profile_qc
from src.utils.quality_controls import run_profile_qc
profile = {
"text" : "Abdul Rahman Ahmed was detained at Guantanamo Bay. ^[art-001]" ,
"tags" : [ "detention" ],
"confidence" : 0.85
}
fixed_profile, report = run_profile_qc(
profile = profile,
min_text_len = 50 ,
min_tags = 1 ,
require_citations = True
)
Run deterministic QC on a generated profile dictionary.
Profile dictionary with text, tags, and confidence fields.
Minimum text length in characters. Profiles shorter than this fail QC.
Minimum number of tags. Defaults to ["needs-review"] if below threshold.
Whether to flag profiles with no citations.
Profile with fixes applied (confidence clamped, tags defaulted).
Summary of QC checks (see below).
ProfileQCReport
class ProfileQCReport ( BaseModel ):
text_length: int
citation_count: int
tag_count: int
confidence: Optional[ float ]
passed: bool
flags: List[ str ]
fixes: Dict[ str , Any]
Length of profile text in characters.
Number of citations found (format: ^[article_id]).
Confidence score (0.0 to 1.0) after clamping/defaulting.
Whether the profile passed QC. False only if text is too short.
Issues detected:
text_too_short: Text below min_text_len
no_citations: No citation markers found
tags_below_minimum: Fewer tags than min_tags
confidence_missing_or_invalid: Confidence not a number
confidence_clamped: Confidence outside [0.0, 1.0]
Corrections applied:
tags_defaulted: Tags set to ["needs-review"]
confidence_set_default: Confidence set to 0.0
confidence_clamped: Confidence clamped to [0.0, 1.0]
Citation Pattern
Citations are detected using the regex pattern:
CITATION_RE = re.compile( r " \^\[ ([ ^ \] \s ] + ) \] " )
Format: ^[article_id] where article_id is non-empty and contains no whitespace.
Example:
Abdul Rahman Ahmed was detained at Guantanamo Bay in 2002. ^[art-001]
He was released in 2009. ^[art-045]
Profile Grounding Verification
verify_profile_grounding
from src.utils.quality_controls import verify_profile_grounding
profile_text = """
Abdul Rahman Ahmed was detained in 2002. ^[art-001]
He was released in 2009. ^[art-045]
"""
article_texts = {
"art-001" : "Abdul Rahman Ahmed, a Yemeni national, was captured in 2002..." ,
"art-045" : "Ahmed was released from Guantanamo Bay in January 2009..." ,
}
report = verify_profile_grounding(
profile_text = profile_text,
article_texts = article_texts,
model_type = "gemini" ,
max_article_chars = 12000 ,
max_claim_chars = 600 ,
min_grounding_score = 0.7
)
print ( f "Grounding score: { report.grounding_score :.1%} " ) # 100%
print ( f "Passed: { report.passed } " ) # True
Verify that profile claims are supported by their cited sources.
This function makes LLM calls (one per cited article). Use sparingly in production.
Profile text with citation markers.
Mapping of article IDs to full article text.
LLM backend: "gemini" (cloud) or "ollama" (local).
Maximum article text length to send to LLM (truncates longer articles).
Maximum claim text length (truncates longer claims).
Minimum grounding score to pass QC (0.0 to 1.0).
Detailed verification report (see below).
GroundingReport
class GroundingReport ( BaseModel ):
profile_text_hash: str
total_citations: int
verified: int
unverified: int
missing_source: int
grounding_score: Optional[ float ]
passed: bool
flags: List[ str ]
verifications: List[ClaimVerification]
SHA-256 hash of profile text (for caching verification results).
Number of citations in profile.
Citations with SUPPORTED or PARTIAL support.
Citations with NOT_SUPPORTED or UNCLEAR support.
Citations whose source article was not provided.
Ratio of verified to total citations (0.0 to 1.0).
Whether grounding score ≥ min_grounding_score.
Issues detected:
no_citations: Profile has no citations
missing_sources: Some cited articles not provided
unsupported_claims: At least one claim marked NOT_SUPPORTED
low_grounding_score: Score below threshold
llm_count_mismatch: LLM returned wrong number of verifications
verification_error: LLM call failed
Per-claim verification details.
ClaimVerification
class ClaimVerification ( BaseModel ):
article_id: str
citation: str
claim: str
support_level: SupportLevel
reasoning: Optional[ str ]
Article ID from citation.
Original citation marker (e.g., ^[art-001]).
Text span supported by the citation.
Enum:
SUPPORTED: Source clearly supports the claim
PARTIAL: Source partially supports the claim
NOT_SUPPORTED: Source does not support the claim
UNCLEAR: Source is ambiguous or insufficient
MISSING_SOURCE: Source article not available
Brief explanation of the support level.
Example: Grounding Verification
from src.utils.quality_controls import verify_profile_grounding
profile_text = """
Mohamedou Ould Slahi was detained at Guantanamo Bay. ^[slahi-memoir]
He wrote a bestselling memoir about his experience. ^[slahi-memoir]
He was released in 2016. ^[missing-article]
"""
article_texts = {
"slahi-memoir" : "Mohamedou Ould Slahi published Guantanamo Diary in 2015..." ,
# "missing-article" not provided
}
report = verify_profile_grounding(
profile_text = profile_text,
article_texts = article_texts,
min_grounding_score = 0.7
)
print ( f "Total citations: { report.total_citations } " ) # 3
print ( f "Verified: { report.verified } " ) # 2
print ( f "Missing source: { report.missing_source } " ) # 1
print ( f "Grounding score: { report.grounding_score :.1%} " ) # 66.7%
print ( f "Passed: { report.passed } " ) # False (< 0.7)
print ( f "Flags: { report.flags } " ) # ["missing_sources", "low_grounding_score"]
for v in report.verifications:
print ( f " { v.citation } : { v.support_level } - { v.reasoning } " )
Utility Functions
normalize_name
from src.utils.quality_controls import normalize_name
name = normalize_name( " Abdul Rahman Ahmed " )
print (name) # "Abdul Rahman Ahmed"
Normalize an entity name:
Strip leading/trailing whitespace
Collapse multiple whitespace to single space
Unicode NFC normalization
Automatically called by run_extraction_qc() — you typically don’t need to call this directly.
Constants
Default Thresholds
from src.constants import (
QC_MIN_NAME_LENGTH ,
PROFILE_QC_MIN_TEXT_LENGTH ,
PROFILE_QC_MIN_TAG_COUNT ,
)
print ( QC_MIN_NAME_LENGTH ) # 2
print ( PROFILE_QC_MIN_TEXT_LENGTH ) # 100
print ( PROFILE_QC_MIN_TAG_COUNT ) # 2
These constants define default QC thresholds used throughout the pipeline.
Integration Example
Complete Extraction Pipeline
from src.utils.quality_controls import (
run_extraction_qc,
filter_entities_by_article_relevance,
)
from src.engine.extractors import extract_entities
# 1. Extract entities (LLM call)
raw_entities = extract_entities(
text = article_text,
entity_type = "organizations" ,
domain = "guantanamo"
)
# 2. Run extraction QC
cleaned, qc_report = run_extraction_qc(
entity_type = "organizations" ,
entities = raw_entities,
domain = "guantanamo"
)
if qc_report.flags:
print ( f "QC flags: { qc_report.flags } " )
# 3. Filter by article relevance
kept, relevance_report = filter_entities_by_article_relevance(
entity_type = "organizations" ,
entities = cleaned,
article_text = article_text,
domain = "guantanamo"
)
print ( f "Extracted: { qc_report.input_count } " )
print ( f "After QC: { qc_report.output_count } " )
print ( f "After relevance: { relevance_report.output_count } " )
See Also