Overview
The profiles module provides functions for generating narrative profiles of entities using LLM reflection, updating profiles with new article evidence, and tracking profile version history.
Core Data Models
EntityProfile
Structured profile model returned by LLM generation:
class EntityProfile(BaseModel):
text: str # Comprehensive narrative profile
tags: List[str] = [] # Relevant keywords/tags
confidence: float # 0-1 confidence score
sources: List[str] = [] # Article IDs used as evidence
VersionedProfile
Container for profile version history:
class VersionedProfile(BaseModel):
current_version: int = 1
versions: List[ProfileVersion] = []
def add_version(
self,
profile_data: Dict,
trigger_article_id: Optional[str] = None
) -> ProfileVersion
def get_version(self, version_number: int) -> Optional[ProfileVersion]
def get_latest(self) -> Optional[ProfileVersion]
The current active version number
versions
List[ProfileVersion]
default:"[]"
List of historical profile snapshots
Methods
add_version() - Append a new version to history
new_version = versioned_profile.add_version(
profile_data={"text": "...", "tags": [...], ...},
trigger_article_id="doc123"
)
get_version() - Retrieve a specific version by number
get_latest() - Get the most recent version
ProfileVersion
A single snapshot in the version history:
class ProfileVersion(BaseModel):
version_number: int
profile_data: Dict # Complete profile snapshot (deep copied)
created_at: datetime
trigger_article_id: Optional[str] = None
Sequential version number (1, 2, 3, …)
Deep copy of the complete profile at this version (prevents mutation)
Timestamp when this version was created (auto-set to datetime.now(UTC))
trigger_article_id
Optional[str]
default:"None"
Article ID that triggered this profile update
Profile Creation
create_profile()
Create an initial profile for an entity using LLM reflection and iterative improvement.
from src.engine.profiles import create_profile
profile_dict, versioned_profile, improvement_history = create_profile(
entity_type: str,
entity_name: str,
article_text: str,
article_id: str,
model_type: str = "gemini",
domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list]
Entity type singular form: "person", "organization", "location", "event"
Canonical name of the entity
Full article text to use as evidence
Unique article identifier (stored in sources)
Either "gemini" (cloud) or "ollama" (local)
Domain configuration (determines prompts and schemas)
return
Tuple[Dict, VersionedProfile, list]
Returns a 3-tuple:
profile_dict - Profile dictionary with text, tags, confidence, sources
versioned_profile - VersionedProfile container (with initial version if versioning enabled)
improvement_history - List of reflection iterations with pass/fail results
Example
profile, versioned, history = create_profile(
entity_type="person",
entity_name="Geoffrey Miller",
article_text=article_content,
article_id="doc123",
model_type="gemini",
domain="guantanamo"
)
print(profile["text"])
# "Geoffrey Miller is a retired United States Army major general who..."
print(f"Confidence: {profile['confidence']:.2f}")
print(f"Tags: {profile['tags']}")
print(f"Reflection iterations: {len(history)}")
Profile creation uses iterative LLM reflection (max 3 iterations by default). Each iteration validates and improves the previous attempt. The improvement_history contains the reflection reasoning for each pass.
create_profile_with_outcome()
Safe version that never raises exceptions - returns PhaseOutcome with fallback on error.
from src.engine.profiles import create_profile_with_outcome
profile_dict, versioned_profile, history, outcome = create_profile_with_outcome(
entity_type: str,
entity_name: str,
article_text: str,
article_id: str,
model_type: str = "gemini",
domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list, PhaseOutcome]
Parameters are identical to create_profile(), but returns a 4-tuple with a PhaseOutcome object that includes:
- Success/failure status
- QC report (text length, citation count, tag count)
- QC fixes applied
- Error details (if failed)
On failure, returns a minimal fallback profile with outcome.fallback populated.
Profile Updates
update_profile()
Update an existing profile with new article evidence using LLM merging.
from src.engine.profiles import update_profile
updated_profile, versioned_profile, history = update_profile(
entity_type: str,
entity_name: str,
existing_profile: Dict,
versioned_profile: VersionedProfile,
new_article_text: str,
new_article_id: str,
model_type: str = "gemini",
domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list]
Entity type singular form
Canonical name of the entity
Current profile dictionary to update
Version history container (mutated with new version)
Article text with new evidence
Article ID (added to sources list)
LLM mode ("gemini" or "ollama")
return
Tuple[Dict, VersionedProfile, list]
Returns:
updated_profile - Merged profile dictionary
versioned_profile - Updated version history (new version appended if ENABLE_PROFILE_VERSIONING=True)
history - Reflection iteration history from this update
Example
existing = {
"text": "Geoffrey Miller is a retired US Army major general.",
"tags": ["military", "JTF-GTMO"],
"confidence": 0.85,
"sources": ["doc123"]
}
versioned = VersionedProfile()
versioned.add_version(existing, trigger_article_id="doc123")
# New article with additional information
new_article = "In 2003, Major General Geoffrey Miller implemented new interrogation policies..."
updated, versioned, history = update_profile(
entity_type="person",
entity_name="Geoffrey Miller",
existing_profile=existing,
versioned_profile=versioned,
new_article_text=new_article,
new_article_id="doc456",
model_type="gemini"
)
print(updated["text"])
# "Geoffrey Miller is a retired US Army major general who served as
# commander of JTF-GTMO. In 2003, he implemented new interrogation policies..."
print(updated["sources"])
# ["doc123", "doc456"]
print(versioned.current_version)
# 2
The LLM receives both the existing profile and new article text, then generates a merged profile that incorporates new information while preserving established facts.
update_profile_with_outcome()
Safe version that never raises - returns existing profile as fallback on error.
updated_profile, versioned_profile, history, outcome = update_profile_with_outcome(
entity_type: str,
entity_name: str,
existing_profile: Dict,
versioned_profile: VersionedProfile,
new_article_text: str,
new_article_id: str,
model_type: str = "gemini",
domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list, PhaseOutcome]
Identical to update_profile() but returns PhaseOutcome with QC report and error handling.
Version History Control
Profile versioning is controlled by the ENABLE_PROFILE_VERSIONING constant:
from src.constants import ENABLE_PROFILE_VERSIONING
if ENABLE_PROFILE_VERSIONING:
# Version snapshots are created on every profile create/update
versioned_profile.add_version(profile_data, trigger_article_id)
else:
# Versioning disabled - only current profile stored
pass
When enabled, each create_profile() and update_profile() call appends a deep copy of the profile to the version history:
# In entity database:
entity = {
"name": "Geoffrey Miller",
"profile": {"text": "...", ...}, # Current profile
"profile_versions": { # Version history
"current_version": 3,
"versions": [
{
"version_number": 1,
"profile_data": {...},
"created_at": "2024-01-15T10:30:00Z",
"trigger_article_id": "doc123"
},
{
"version_number": 2,
"profile_data": {...},
"created_at": "2024-01-16T14:20:00Z",
"trigger_article_id": "doc456"
},
# ...
]
}
}
Reflection & Iterative Improvement
Both create_profile() and update_profile() use iterative LLM reflection via iterative_improve():
from src.utils.llm import iterative_improve, GenerationMode
result = iterative_improve(
initial_text="{}", # Start with empty JSON
generation_messages=[
{"role": "system", "content": generation_prompt},
{"role": "user", "content": "Create a profile for..."}
],
reflection_prompt="Validate the profile. Check for...",
response_model=EntityProfile,
max_iterations=3,
mode=GenerationMode.CLOUD # or GenerationMode.LOCAL
)
final_profile = result["text"]
history = result["reflection_history"]
Each iteration:
- Generates a profile candidate
- Reflects on quality/completeness
- Returns
{"valid": bool, "reasoning": str}
- If invalid, regenerates with reflection feedback
- Stops at first valid result or max iterations
The improvement_history returned contains all reflection passes:
[
{"valid": False, "reasoning": "Profile lacks specific dates and roles"},
{"valid": True, "reasoning": "Profile now includes complete timeline and position details"}
]
Profile Quality Control
All profiles pass through deterministic QC after generation:
from src.utils.quality_controls import run_profile_qc
profile, qc_report = run_profile_qc(profile=profile_dict)
# qc_report contains:
# - text_length: int
# - citation_count: int
# - tag_count: int
# - flags: List[str] # e.g., ["short_profile", "missing_citations"]
# - fixes: List[str] # Applied corrections
# - passed: bool
See Quality Controls for QC details.
Prompt Configuration
Prompts are loaded from domain configs:
from src.config_loader import get_domain_config
config = get_domain_config("guantanamo")
# Profile generation prompt
generation_template = config.load_profile_prompt("generation")
system_prompt = generation_template.format(
entity_type="person",
entity_name="Geoffrey Miller",
article_id="doc123"
)
# Reflection validation prompt
reflection_template = config.load_profile_prompt("reflection")
reflection_prompt = reflection_template.format(
entity_type="person",
entity_name="Geoffrey Miller",
article_id="doc123"
)
# Profile update prompt
update_template = config.load_profile_prompt("update")
update_prompt = update_template.format(
entity_type="person",
entity_name="Geoffrey Miller"
)
Prompt files live in configs/<domain>/prompts/profile_generation.txt, profile_reflection.txt, profile_update.txt.
Error Handling
Profile generation failures are handled gracefully:
try:
profile, versioned, history = create_profile(...)
except ProfileGenerationError as e:
# Fallback to minimal profile
fallback_profile = handle_profile_error(
entity_name, entity_type, article_id, e, "generation"
)
Fallback profiles have minimal structure:
{
"text": f"Profile for {entity_name} could not be generated.^[{article_id}]",
"tags": [],
"confidence": 0.0,
"sources": [article_id]
}
Use *_with_outcome() variants to avoid exceptions and get structured error reporting.
Source Location
~/workspace/source/src/engine/profiles.py