Skip to main content

Overview

The Synthesis Engine transforms raw research data into structured intelligence reports using Claude 4 Sonnet (primary) with Gemini 2.0 Flash (fallback).

Architecture

Synthesis Engines

AnthropicSynthesisEngine (Primary)

Uses Claude 4 Sonnet for high-quality intelligence reports:
backend/synthesis/anthropic_engine.py
class AnthropicSynthesisEngine:
    """Synthesizes person intelligence reports using Claude as a Gemini fallback."""

    def __init__(self, settings: Settings):
        self._settings = settings
        self._client = None

    @property
    def configured(self) -> bool:
        return bool(self._settings.anthropic_api_key)

    def _get_client(self):
        if self._client is None:
            import anthropic

            self._client = anthropic.AsyncAnthropic(
                api_key=self._settings.anthropic_api_key,
                timeout=30.0,
            )
        return self._client

GeminiSynthesisEngine (Fallback)

Uses Gemini 2.0 Flash when Claude is unavailable:
backend/synthesis/engine.py
class GeminiSynthesisEngine:
    """Synthesizes person intelligence reports using Gemini 2.0 Flash."""

    def __init__(self, settings: Settings):
        self._settings = settings
        self._client = None

    @property
    def configured(self) -> bool:
        return bool(self._settings.gemini_api_key)

    def _get_client(self):
        if self._client is None:
            from google import genai

            self._client = genai.Client(api_key=self._settings.gemini_api_key)
        return self._client

Synthesis Prompt

The engine uses a detailed system prompt optimized for intelligence analysis:
backend/synthesis/anthropic_engine.py
SYNTHESIS_PROMPT = """\
You are an elite person intelligence analyst building a comprehensive dossier. \
Given raw data about a person, synthesize the MOST DETAILED and THOROUGH report possible. \
Extract EVERY fact, detail, connection, and data point from the sources.

Person name: {person_name}

Raw data sources:
{raw_data}

Produce a JSON object with EXACTLY these fields (no extra fields):
{{
  "summary": "A thorough 4-6 sentence profile. Include their full name, current role, \
key accomplishments, notable affiliations, and anything that makes them distinctive. \
Be specific with numbers, dates, and details. This is the intel briefing a field agent \
would receive before meeting this person.",
  "title": "their current job title or primary role",
  "company": "their current company or organization",
  "work_history": [
    {{"role": "Job Title", "company": "Company Name", "period": "2020-present"}}
  ],
  "education": [
    {{"school": "University Name", "degree": "BS Computer Science"}}
  ],
  "social_profiles": {{
    "linkedin": "full linkedin URL or null",
    "twitter": "full twitter URL or @handle or null",
    "instagram": "full instagram URL or @handle or null",
    "github": "full github URL or null",
    "website": "full website URL or null"
  }},
  "notable_activity": ["Be specific: 'Published paper on X at Y conference (2024)', \
not vague 'Has published papers'. Include dates, numbers, specifics."],
  "conversation_hooks": ["Highly specific talking points that show deep knowledge. \
Reference their actual projects, recent posts, interests. e.g. 'Ask about their recent \
talk at PyCon on async patterns' not generic 'Ask about their work'"],
  "risk_flags": ["Any red flags, controversies, lawsuits, data breaches, or concerning \
associations. Empty array if genuinely none."]
}}

Rules:
- MAXIMIZE detail. Extract every fact from the raw data. Do not summarize away specifics.
- Only include information supported by the raw data. Do not fabricate.
- If a field has no data, use empty string, empty array, or null.
- Conversation hooks must be SPECIFIC and reference actual projects/posts/interests.
- The summary should read like a classified intelligence briefing, not a LinkedIn bio.
- Notable activity items should each be a complete, specific fact with context.
- Return ONLY valid JSON, no markdown fencing, no explanation.
"""

Synthesis Request

Data Aggregation

The engine builds a structured data block from multiple sources:
backend/synthesis/anthropic_engine.py
def _build_raw_data_block(self, request: SynthesisRequest) -> str:
    sections: list[str] = []

    if request.face_search_urls:
        sections.append("== Face Search URLs ==")
        for url in request.face_search_urls:
            sections.append(f"  - {url}")

    if request.enrichment_snippets:
        sections.append("== Enrichment Results ==")
        for snippet in request.enrichment_snippets:
            sections.append(f"  {snippet}")

    if request.social_profiles:
        sections.append("== Known Social Profiles ==")
        for sp in request.social_profiles:
            line = f"  - {sp.platform}: {sp.url}"
            if sp.username:
                line += f" ({sp.username})"
            if sp.bio:
                line += f" — {sp.bio}"
            sections.append(line)

    if request.raw_agent_data:
        for agent_name, data in request.raw_agent_data.items():
            sections.append(f"== {agent_name} Agent Data ==")
            sections.append(f"  {data}")

    if not sections:
        sections.append("No data available. Return empty/null fields.")

    return "\n".join(sections)

Claude Synthesis

API Call

backend/synthesis/anthropic_engine.py
async def synthesize(self, request: SynthesisRequest) -> SynthesisResult:
    """Synthesize enrichment data into a structured person report."""
    logger.info("AnthropicSynthesisEngine.synthesize person={}", request.person_name)

    if not self.configured:
        return SynthesisResult(
            person_name=request.person_name,
            success=False,
            error="Anthropic API key not configured (ANTHROPIC_API_KEY missing)",
        )

    try:
        raw_data = self._build_raw_data_block(request)
        prompt = SYNTHESIS_PROMPT.format(
            person_name=request.person_name,
            raw_data=raw_data,
        )

        client = self._get_client()
        response = await client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=8192,
            messages=[{"role": "user", "content": prompt}],
        )

        # Extract text from response (skip thinking blocks if present)
        response_text = ""
        for block in response.content:
            if hasattr(block, "text"):
                response_text = block.text
                break
        if not response_text:
            return SynthesisResult(
                person_name=request.person_name,
                success=False,
                error="Claude returned empty response",
            )

        dossier = self._parse_response(response_text, request.person_name)

        return SynthesisResult(
            person_name=request.person_name,
            summary=dossier.summary,
            occupation=dossier.title,
            organization=dossier.company,
            dossier=dossier,
            confidence_score=0.75,
        )

Response Parsing

backend/synthesis/anthropic_engine.py
def _parse_response(self, text: str, person_name: str) -> DossierReport:
    """Parse Claude JSON response into a DossierReport."""
    cleaned = text.strip()
    if cleaned.startswith("```"):
        lines = cleaned.split("\n")
        lines = lines[1:]
        if lines and lines[-1].strip() == "```":
            lines = lines[:-1]
        cleaned = "\n".join(lines)

    data = json.loads(cleaned)

    work_history = [
        WorkHistoryEntry(
            role=entry.get("role", ""),
            company=entry.get("company", ""),
            period=entry.get("period") or None,
        )
        for entry in data.get("work_history", [])
        if entry.get("role") or entry.get("company")
    ]

    education = [
        EducationEntry(
            school=entry.get("school", ""),
            degree=entry.get("degree") or None,
        )
        for entry in data.get("education", [])
        if entry.get("school")
    ]

    sp_data = data.get("social_profiles", {})
    social_profiles = SocialProfiles(
        linkedin=sp_data.get("linkedin"),
        twitter=sp_data.get("twitter"),
        instagram=sp_data.get("instagram"),
        github=sp_data.get("github"),
        website=sp_data.get("website"),
    )

    return DossierReport(
        summary=data.get("summary", ""),
        title=data.get("title"),
        company=data.get("company"),
        work_history=work_history,
        education=education,
        social_profiles=social_profiles,
        notable_activity=data.get("notable_activity", []),
        conversation_hooks=data.get("conversation_hooks", []),
        risk_flags=data.get("risk_flags", []),
    )

Streaming Research Endpoint

The synthesis engine integrates with the streaming research endpoint:
backend/main.py
@app.get("/api/research/{person_name}/stream")
async def stream_research(person_name: str, image_url: str | None = None):
    """SSE endpoint: stream research results as they arrive.

    Events:
      - init: {person_id, live_session_id, live_url} — sent first
      - result: AgentResult JSON — sent per agent
      - complete: {} — sent when all agents finish
    """
    if not deep_researcher:
        raise HTTPException(
            status_code=503,
            detail="Browser Use API key not configured — streaming unavailable",
        )

    async def event_generator():
        # ... person creation and research streaming

        # 4. Run synthesis on collected data and push dossier to Convex
        if person_id and synthesis_engine:
            try:
                from synthesis.models import SynthesisRequest
                from synthesis.models import SocialProfile as SynthSocialProfile

                synth_request = SynthesisRequest(
                    person_name=person_name,
                    enrichment_snippets=all_snippets[:50],
                    social_profiles=[],
                    raw_agent_data=agent_data,
                )
                synth_result = await synthesis_engine.synthesize(synth_request)
                if synth_result.success and synth_result.dossier:
                    dossier = synth_result.dossier
                    await db_gateway.update_person(person_id, {
                        "status": "enriched",
                        "summary": synth_result.summary,
                        "occupation": synth_result.occupation,
                        "organization": synth_result.organization,
                        "dossier": dossier.model_dump(),
                    })
                    yield {
                        "event": "dossier",
                        "data": _json.dumps(dossier.to_frontend_dict()),
                    }
            except Exception as exc:
                logger.error("Synthesis failed during stream: {}", exc)

    return EventSourceResponse(event_generator())

Dossier Report Structure

Data Models

The synthesis engine produces structured dossier reports:
class WorkHistoryEntry(BaseModel):
    role: str
    company: str
    period: str | None = None

class EducationEntry(BaseModel):
    school: str
    degree: str | None = None

class SocialProfiles(BaseModel):
    linkedin: str | None = None
    twitter: str | None = None
    instagram: str | None = None
    github: str | None = None
    website: str | None = None

class DossierReport(BaseModel):
    summary: str
    title: str | None = None
    company: str | None = None
    work_history: list[WorkHistoryEntry] = []
    education: list[EducationEntry] = []
    social_profiles: SocialProfiles
    notable_activity: list[str] = []
    conversation_hooks: list[str] = []
    risk_flags: list[str] = []

Frontend Serialization

def to_frontend_dict(self) -> dict:
    """Convert to frontend-friendly format."""
    return {
        "summary": self.summary,
        "title": self.title,
        "company": self.company,
        "workHistory": [wh.model_dump() for wh in self.work_history],
        "education": [ed.model_dump() for ed in self.education],
        "socialProfiles": self.social_profiles.model_dump(),
        "notableActivity": self.notable_activity,
        "conversationHooks": self.conversation_hooks,
        "riskFlags": self.risk_flags,
    }

Configuration

Synthesis engines are initialized with fallback support:
backend/main.py
# Primary synthesis engine (Claude)
synthesis_engine = AnthropicSynthesisEngine(settings) if settings.anthropic_api_key else None

# Fallback synthesis engine (Gemini)
synthesis_fallback = GeminiSynthesisEngine(settings) if settings.gemini_api_key else None

# Pass both to pipeline
pipeline = CapturePipeline(
    # ... other components
    synthesis_engine=synthesis_engine,
    synthesis_fallback=synthesis_fallback,
)

Error Handling

JSON Parsing Errors

backend/synthesis/anthropic_engine.py
try:
    dossier = self._parse_response(response_text, request.person_name)
except json.JSONDecodeError as e:
    logger.error("Failed to parse Claude response as JSON: {}", e)
    return SynthesisResult(
        person_name=request.person_name,
        success=False,
        error=f"Claude response was not valid JSON: {e}",
    )

API Errors

backend/synthesis/anthropic_engine.py
except Exception as e:
    logger.error("Anthropic synthesis failed: {}", e)
    return SynthesisResult(
        person_name=request.person_name,
        success=False,
        error=f"Synthesis error: {e}",
    )

Performance Characteristics

Claude 4 Sonnet

~2-3s for typical dossier

Gemini 2.0 Flash

~1-2s (faster, less detailed)

Max Tokens

8192 (Claude) / auto (Gemini)

Timeout

30s per request

Quality Differences

FeatureClaude 4 SonnetGemini 2.0 Flash
Detail LevelVery highMedium
Conversation HooksSpecific, actionableGeneric
Risk FlagsThoroughBasic
Parsing Reliability95%+85%+
SpeedSlowerFaster
Claude 4 Sonnet is recommended for production use due to higher detail and reliability.

Next Steps

Agent Orchestration

Learn how research data is gathered

Convex Integration

See how dossiers are stored in real-time

Build docs developers (and LLMs) love