Profile Generation & Versioning

Overview

The profiles module provides functions for generating narrative profiles of entities using LLM reflection, updating profiles with new article evidence, and tracking profile version history.

Core Data Models

EntityProfile

Structured profile model returned by LLM generation:

class EntityProfile(BaseModel):
    text: str  # Comprehensive narrative profile
    tags: List[str] = []  # Relevant keywords/tags
    confidence: float  # 0-1 confidence score
    sources: List[str] = []  # Article IDs used as evidence

VersionedProfile

Container for profile version history:

class VersionedProfile(BaseModel):
    current_version: int = 1
    versions: List[ProfileVersion] = []
    
    def add_version(
        self, 
        profile_data: Dict, 
        trigger_article_id: Optional[str] = None
    ) -> ProfileVersion
    
    def get_version(self, version_number: int) -> Optional[ProfileVersion]
    
    def get_latest(self) -> Optional[ProfileVersion]

current_version

int

default:"1"

The current active version number

versions

List[ProfileVersion]

default:"[]"

List of historical profile snapshots

Methods

add_version() - Append a new version to history

new_version = versioned_profile.add_version(
    profile_data={"text": "...", "tags": [...], ...},
    trigger_article_id="doc123"
)

get_version() - Retrieve a specific version by number get_latest() - Get the most recent version

ProfileVersion

A single snapshot in the version history:

class ProfileVersion(BaseModel):
    version_number: int
    profile_data: Dict  # Complete profile snapshot (deep copied)
    created_at: datetime
    trigger_article_id: Optional[str] = None

version_number

int

required

Sequential version number (1, 2, 3, …)

profile_data

Dict

required

Deep copy of the complete profile at this version (prevents mutation)

created_at

datetime

required

Timestamp when this version was created (auto-set to datetime.now(UTC))

trigger_article_id

Optional[str]

default:"None"

Article ID that triggered this profile update

Profile Creation

create_profile()

Create an initial profile for an entity using LLM reflection and iterative improvement.

from src.engine.profiles import create_profile

profile_dict, versioned_profile, improvement_history = create_profile(
    entity_type: str,
    entity_name: str,
    article_text: str,
    article_id: str,
    model_type: str = "gemini",
    domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list]

entity_type

str

required

Entity type singular form: "person", "organization", "location", "event"

entity_name

str

required

Canonical name of the entity

article_text

str

required

Full article text to use as evidence

article_id

str

required

Unique article identifier (stored in sources)

model_type

str

default:"gemini"

Either "gemini" (cloud) or "ollama" (local)

domain

str

default:"guantanamo"

Domain configuration (determines prompts and schemas)

return

Tuple[Dict, VersionedProfile, list]

Returns a 3-tuple:

profile_dict - Profile dictionary with text, tags, confidence, sources
versioned_profile - VersionedProfile container (with initial version if versioning enabled)
improvement_history - List of reflection iterations with pass/fail results

Example

profile, versioned, history = create_profile(
    entity_type="person",
    entity_name="Geoffrey Miller",
    article_text=article_content,
    article_id="doc123",
    model_type="gemini",
    domain="guantanamo"
)

print(profile["text"])
# "Geoffrey Miller is a retired United States Army major general who..."

print(f"Confidence: {profile['confidence']:.2f}")
print(f"Tags: {profile['tags']}")
print(f"Reflection iterations: {len(history)}")

Profile creation uses iterative LLM reflection (max 3 iterations by default). Each iteration validates and improves the previous attempt. The improvement_history contains the reflection reasoning for each pass.

create_profile_with_outcome()

Safe version that never raises exceptions - returns PhaseOutcome with fallback on error.

from src.engine.profiles import create_profile_with_outcome

profile_dict, versioned_profile, history, outcome = create_profile_with_outcome(
    entity_type: str,
    entity_name: str,
    article_text: str,
    article_id: str,
    model_type: str = "gemini",
    domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list, PhaseOutcome]

Parameters are identical to create_profile(), but returns a 4-tuple with a PhaseOutcome object that includes:

Success/failure status
QC report (text length, citation count, tag count)
QC fixes applied
Error details (if failed)

On failure, returns a minimal fallback profile with outcome.fallback populated.

Profile Updates

update_profile()

Update an existing profile with new article evidence using LLM merging.

from src.engine.profiles import update_profile

updated_profile, versioned_profile, history = update_profile(
    entity_type: str,
    entity_name: str,
    existing_profile: Dict,
    versioned_profile: VersionedProfile,
    new_article_text: str,
    new_article_id: str,
    model_type: str = "gemini",
    domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list]

entity_type

str

required

Entity type singular form

entity_name

str

required

Canonical name of the entity

existing_profile

Dict

required

Current profile dictionary to update

versioned_profile

VersionedProfile

required

Version history container (mutated with new version)

new_article_text

str

required

Article text with new evidence

new_article_id

str

required

Article ID (added to sources list)

model_type

str

default:"gemini"

LLM mode ("gemini" or "ollama")

domain

str

default:"guantanamo"

Domain configuration

return

Tuple[Dict, VersionedProfile, list]

Returns:

updated_profile - Merged profile dictionary
versioned_profile - Updated version history (new version appended if ENABLE_PROFILE_VERSIONING=True)
history - Reflection iteration history from this update

Example

existing = {
    "text": "Geoffrey Miller is a retired US Army major general.",
    "tags": ["military", "JTF-GTMO"],
    "confidence": 0.85,
    "sources": ["doc123"]
}

versioned = VersionedProfile()
versioned.add_version(existing, trigger_article_id="doc123")

# New article with additional information
new_article = "In 2003, Major General Geoffrey Miller implemented new interrogation policies..."

updated, versioned, history = update_profile(
    entity_type="person",
    entity_name="Geoffrey Miller",
    existing_profile=existing,
    versioned_profile=versioned,
    new_article_text=new_article,
    new_article_id="doc456",
    model_type="gemini"
)

print(updated["text"])
# "Geoffrey Miller is a retired US Army major general who served as 
#  commander of JTF-GTMO. In 2003, he implemented new interrogation policies..."

print(updated["sources"])
# ["doc123", "doc456"]

print(versioned.current_version)
# 2

The LLM receives both the existing profile and new article text, then generates a merged profile that incorporates new information while preserving established facts.

update_profile_with_outcome()

Safe version that never raises - returns existing profile as fallback on error.

updated_profile, versioned_profile, history, outcome = update_profile_with_outcome(
    entity_type: str,
    entity_name: str,
    existing_profile: Dict,
    versioned_profile: VersionedProfile,
    new_article_text: str,
    new_article_id: str,
    model_type: str = "gemini",
    domain: str = "guantanamo"
) -> Tuple[Dict, VersionedProfile, list, PhaseOutcome]

Identical to update_profile() but returns PhaseOutcome with QC report and error handling.

Version History Control

Profile versioning is controlled by the ENABLE_PROFILE_VERSIONING constant:

from src.constants import ENABLE_PROFILE_VERSIONING

if ENABLE_PROFILE_VERSIONING:
    # Version snapshots are created on every profile create/update
    versioned_profile.add_version(profile_data, trigger_article_id)
else:
    # Versioning disabled - only current profile stored
    pass

When enabled, each create_profile() and update_profile() call appends a deep copy of the profile to the version history:

# In entity database:
entity = {
    "name": "Geoffrey Miller",
    "profile": {"text": "...", ...},  # Current profile
    "profile_versions": {              # Version history
        "current_version": 3,
        "versions": [
            {
                "version_number": 1,
                "profile_data": {...},
                "created_at": "2024-01-15T10:30:00Z",
                "trigger_article_id": "doc123"
            },
            {
                "version_number": 2,
                "profile_data": {...},
                "created_at": "2024-01-16T14:20:00Z",
                "trigger_article_id": "doc456"
            },
            # ...
        ]
    }
}

Reflection & Iterative Improvement

Both create_profile() and update_profile() use iterative LLM reflection via iterative_improve():

from src.utils.llm import iterative_improve, GenerationMode

result = iterative_improve(
    initial_text="{}",  # Start with empty JSON
    generation_messages=[
        {"role": "system", "content": generation_prompt},
        {"role": "user", "content": "Create a profile for..."}
    ],
    reflection_prompt="Validate the profile. Check for...",
    response_model=EntityProfile,
    max_iterations=3,
    mode=GenerationMode.CLOUD  # or GenerationMode.LOCAL
)

final_profile = result["text"]
history = result["reflection_history"]

Each iteration:

Generates a profile candidate
Reflects on quality/completeness
Returns {"valid": bool, "reasoning": str}
If invalid, regenerates with reflection feedback
Stops at first valid result or max iterations

The improvement_history returned contains all reflection passes:

[
    {"valid": False, "reasoning": "Profile lacks specific dates and roles"},
    {"valid": True, "reasoning": "Profile now includes complete timeline and position details"}
]

Profile Quality Control

All profiles pass through deterministic QC after generation:

from src.utils.quality_controls import run_profile_qc

profile, qc_report = run_profile_qc(profile=profile_dict)

# qc_report contains:
# - text_length: int
# - citation_count: int  
# - tag_count: int
# - flags: List[str]  # e.g., ["short_profile", "missing_citations"]
# - fixes: List[str]  # Applied corrections
# - passed: bool

See Quality Controls for QC details.

Prompt Configuration

Prompts are loaded from domain configs:

from src.config_loader import get_domain_config

config = get_domain_config("guantanamo")

# Profile generation prompt
generation_template = config.load_profile_prompt("generation")
system_prompt = generation_template.format(
    entity_type="person",
    entity_name="Geoffrey Miller",
    article_id="doc123"
)

# Reflection validation prompt
reflection_template = config.load_profile_prompt("reflection")
reflection_prompt = reflection_template.format(
    entity_type="person",
    entity_name="Geoffrey Miller",
    article_id="doc123"
)

# Profile update prompt
update_template = config.load_profile_prompt("update")
update_prompt = update_template.format(
    entity_type="person",
    entity_name="Geoffrey Miller"
)

Prompt files live in configs/<domain>/prompts/profile_generation.txt, profile_reflection.txt, profile_update.txt.

Error Handling

Profile generation failures are handled gracefully:

try:
    profile, versioned, history = create_profile(...)
except ProfileGenerationError as e:
    # Fallback to minimal profile
    fallback_profile = handle_profile_error(
        entity_name, entity_type, article_id, e, "generation"
    )

Fallback profiles have minimal structure:

{
    "text": f"Profile for {entity_name} could not be generated.^[{article_id}]",
    "tags": [],
    "confidence": 0.0,
    "sources": [article_id]
}

Use *_with_outcome() variants to avoid exceptions and get structured error reporting.

Source Location

~/workspace/source/src/engine/profiles.py

CLI

Engine

Utilities

Profile Generation & Versioning

Overview

Core Data Models

EntityProfile

VersionedProfile

Methods

ProfileVersion

Profile Creation

create_profile()

Example

create_profile_with_outcome()

Profile Updates

update_profile()

Example

update_profile_with_outcome()

Version History Control

Reflection & Iterative Improvement

Profile Quality Control

Prompt Configuration

Error Handling

Source Location

Build docs developers (and LLMs) love

CLI

Engine

Utilities

​Overview

​Core Data Models

​EntityProfile

​VersionedProfile

​Methods

​ProfileVersion

​Profile Creation

​create_profile()

​Example

​create_profile_with_outcome()

​Profile Updates

​update_profile()

​Example

​update_profile_with_outcome()

​Version History Control

​Reflection & Iterative Improvement

​Profile Quality Control

​Prompt Configuration

​Error Handling

​Source Location

Build docs developers (and LLMs) love

Overview

Core Data Models

EntityProfile

VersionedProfile

Methods

ProfileVersion

Profile Creation

create_profile()

Example

create_profile_with_outcome()

Profile Updates

update_profile()

Example

update_profile_with_outcome()

Version History Control

Reflection & Iterative Improvement

Profile Quality Control

Prompt Configuration

Error Handling

Source Location