Overview
The Synthesis Engine transforms raw research data into structured intelligence reports using Claude 4 Sonnet (primary) with Gemini 2.0 Flash (fallback).Architecture
Synthesis Engines
AnthropicSynthesisEngine (Primary)
Uses Claude 4 Sonnet for high-quality intelligence reports:backend/synthesis/anthropic_engine.py
class AnthropicSynthesisEngine:
"""Synthesizes person intelligence reports using Claude as a Gemini fallback."""
def __init__(self, settings: Settings):
self._settings = settings
self._client = None
@property
def configured(self) -> bool:
return bool(self._settings.anthropic_api_key)
def _get_client(self):
if self._client is None:
import anthropic
self._client = anthropic.AsyncAnthropic(
api_key=self._settings.anthropic_api_key,
timeout=30.0,
)
return self._client
GeminiSynthesisEngine (Fallback)
Uses Gemini 2.0 Flash when Claude is unavailable:backend/synthesis/engine.py
class GeminiSynthesisEngine:
"""Synthesizes person intelligence reports using Gemini 2.0 Flash."""
def __init__(self, settings: Settings):
self._settings = settings
self._client = None
@property
def configured(self) -> bool:
return bool(self._settings.gemini_api_key)
def _get_client(self):
if self._client is None:
from google import genai
self._client = genai.Client(api_key=self._settings.gemini_api_key)
return self._client
Synthesis Prompt
The engine uses a detailed system prompt optimized for intelligence analysis:backend/synthesis/anthropic_engine.py
SYNTHESIS_PROMPT = """\
You are an elite person intelligence analyst building a comprehensive dossier. \
Given raw data about a person, synthesize the MOST DETAILED and THOROUGH report possible. \
Extract EVERY fact, detail, connection, and data point from the sources.
Person name: {person_name}
Raw data sources:
{raw_data}
Produce a JSON object with EXACTLY these fields (no extra fields):
{{
"summary": "A thorough 4-6 sentence profile. Include their full name, current role, \
key accomplishments, notable affiliations, and anything that makes them distinctive. \
Be specific with numbers, dates, and details. This is the intel briefing a field agent \
would receive before meeting this person.",
"title": "their current job title or primary role",
"company": "their current company or organization",
"work_history": [
{{"role": "Job Title", "company": "Company Name", "period": "2020-present"}}
],
"education": [
{{"school": "University Name", "degree": "BS Computer Science"}}
],
"social_profiles": {{
"linkedin": "full linkedin URL or null",
"twitter": "full twitter URL or @handle or null",
"instagram": "full instagram URL or @handle or null",
"github": "full github URL or null",
"website": "full website URL or null"
}},
"notable_activity": ["Be specific: 'Published paper on X at Y conference (2024)', \
not vague 'Has published papers'. Include dates, numbers, specifics."],
"conversation_hooks": ["Highly specific talking points that show deep knowledge. \
Reference their actual projects, recent posts, interests. e.g. 'Ask about their recent \
talk at PyCon on async patterns' not generic 'Ask about their work'"],
"risk_flags": ["Any red flags, controversies, lawsuits, data breaches, or concerning \
associations. Empty array if genuinely none."]
}}
Rules:
- MAXIMIZE detail. Extract every fact from the raw data. Do not summarize away specifics.
- Only include information supported by the raw data. Do not fabricate.
- If a field has no data, use empty string, empty array, or null.
- Conversation hooks must be SPECIFIC and reference actual projects/posts/interests.
- The summary should read like a classified intelligence briefing, not a LinkedIn bio.
- Notable activity items should each be a complete, specific fact with context.
- Return ONLY valid JSON, no markdown fencing, no explanation.
"""
Synthesis Request
Data Aggregation
The engine builds a structured data block from multiple sources:backend/synthesis/anthropic_engine.py
def _build_raw_data_block(self, request: SynthesisRequest) -> str:
sections: list[str] = []
if request.face_search_urls:
sections.append("== Face Search URLs ==")
for url in request.face_search_urls:
sections.append(f" - {url}")
if request.enrichment_snippets:
sections.append("== Enrichment Results ==")
for snippet in request.enrichment_snippets:
sections.append(f" {snippet}")
if request.social_profiles:
sections.append("== Known Social Profiles ==")
for sp in request.social_profiles:
line = f" - {sp.platform}: {sp.url}"
if sp.username:
line += f" ({sp.username})"
if sp.bio:
line += f" — {sp.bio}"
sections.append(line)
if request.raw_agent_data:
for agent_name, data in request.raw_agent_data.items():
sections.append(f"== {agent_name} Agent Data ==")
sections.append(f" {data}")
if not sections:
sections.append("No data available. Return empty/null fields.")
return "\n".join(sections)
Claude Synthesis
API Call
backend/synthesis/anthropic_engine.py
async def synthesize(self, request: SynthesisRequest) -> SynthesisResult:
"""Synthesize enrichment data into a structured person report."""
logger.info("AnthropicSynthesisEngine.synthesize person={}", request.person_name)
if not self.configured:
return SynthesisResult(
person_name=request.person_name,
success=False,
error="Anthropic API key not configured (ANTHROPIC_API_KEY missing)",
)
try:
raw_data = self._build_raw_data_block(request)
prompt = SYNTHESIS_PROMPT.format(
person_name=request.person_name,
raw_data=raw_data,
)
client = self._get_client()
response = await client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
messages=[{"role": "user", "content": prompt}],
)
# Extract text from response (skip thinking blocks if present)
response_text = ""
for block in response.content:
if hasattr(block, "text"):
response_text = block.text
break
if not response_text:
return SynthesisResult(
person_name=request.person_name,
success=False,
error="Claude returned empty response",
)
dossier = self._parse_response(response_text, request.person_name)
return SynthesisResult(
person_name=request.person_name,
summary=dossier.summary,
occupation=dossier.title,
organization=dossier.company,
dossier=dossier,
confidence_score=0.75,
)
Response Parsing
backend/synthesis/anthropic_engine.py
def _parse_response(self, text: str, person_name: str) -> DossierReport:
"""Parse Claude JSON response into a DossierReport."""
cleaned = text.strip()
if cleaned.startswith("```"):
lines = cleaned.split("\n")
lines = lines[1:]
if lines and lines[-1].strip() == "```":
lines = lines[:-1]
cleaned = "\n".join(lines)
data = json.loads(cleaned)
work_history = [
WorkHistoryEntry(
role=entry.get("role", ""),
company=entry.get("company", ""),
period=entry.get("period") or None,
)
for entry in data.get("work_history", [])
if entry.get("role") or entry.get("company")
]
education = [
EducationEntry(
school=entry.get("school", ""),
degree=entry.get("degree") or None,
)
for entry in data.get("education", [])
if entry.get("school")
]
sp_data = data.get("social_profiles", {})
social_profiles = SocialProfiles(
linkedin=sp_data.get("linkedin"),
twitter=sp_data.get("twitter"),
instagram=sp_data.get("instagram"),
github=sp_data.get("github"),
website=sp_data.get("website"),
)
return DossierReport(
summary=data.get("summary", ""),
title=data.get("title"),
company=data.get("company"),
work_history=work_history,
education=education,
social_profiles=social_profiles,
notable_activity=data.get("notable_activity", []),
conversation_hooks=data.get("conversation_hooks", []),
risk_flags=data.get("risk_flags", []),
)
Streaming Research Endpoint
The synthesis engine integrates with the streaming research endpoint:backend/main.py
@app.get("/api/research/{person_name}/stream")
async def stream_research(person_name: str, image_url: str | None = None):
"""SSE endpoint: stream research results as they arrive.
Events:
- init: {person_id, live_session_id, live_url} — sent first
- result: AgentResult JSON — sent per agent
- complete: {} — sent when all agents finish
"""
if not deep_researcher:
raise HTTPException(
status_code=503,
detail="Browser Use API key not configured — streaming unavailable",
)
async def event_generator():
# ... person creation and research streaming
# 4. Run synthesis on collected data and push dossier to Convex
if person_id and synthesis_engine:
try:
from synthesis.models import SynthesisRequest
from synthesis.models import SocialProfile as SynthSocialProfile
synth_request = SynthesisRequest(
person_name=person_name,
enrichment_snippets=all_snippets[:50],
social_profiles=[],
raw_agent_data=agent_data,
)
synth_result = await synthesis_engine.synthesize(synth_request)
if synth_result.success and synth_result.dossier:
dossier = synth_result.dossier
await db_gateway.update_person(person_id, {
"status": "enriched",
"summary": synth_result.summary,
"occupation": synth_result.occupation,
"organization": synth_result.organization,
"dossier": dossier.model_dump(),
})
yield {
"event": "dossier",
"data": _json.dumps(dossier.to_frontend_dict()),
}
except Exception as exc:
logger.error("Synthesis failed during stream: {}", exc)
return EventSourceResponse(event_generator())
Dossier Report Structure
Data Models
The synthesis engine produces structured dossier reports:class WorkHistoryEntry(BaseModel):
role: str
company: str
period: str | None = None
class EducationEntry(BaseModel):
school: str
degree: str | None = None
class SocialProfiles(BaseModel):
linkedin: str | None = None
twitter: str | None = None
instagram: str | None = None
github: str | None = None
website: str | None = None
class DossierReport(BaseModel):
summary: str
title: str | None = None
company: str | None = None
work_history: list[WorkHistoryEntry] = []
education: list[EducationEntry] = []
social_profiles: SocialProfiles
notable_activity: list[str] = []
conversation_hooks: list[str] = []
risk_flags: list[str] = []
Frontend Serialization
def to_frontend_dict(self) -> dict:
"""Convert to frontend-friendly format."""
return {
"summary": self.summary,
"title": self.title,
"company": self.company,
"workHistory": [wh.model_dump() for wh in self.work_history],
"education": [ed.model_dump() for ed in self.education],
"socialProfiles": self.social_profiles.model_dump(),
"notableActivity": self.notable_activity,
"conversationHooks": self.conversation_hooks,
"riskFlags": self.risk_flags,
}
Configuration
Synthesis engines are initialized with fallback support:backend/main.py
# Primary synthesis engine (Claude)
synthesis_engine = AnthropicSynthesisEngine(settings) if settings.anthropic_api_key else None
# Fallback synthesis engine (Gemini)
synthesis_fallback = GeminiSynthesisEngine(settings) if settings.gemini_api_key else None
# Pass both to pipeline
pipeline = CapturePipeline(
# ... other components
synthesis_engine=synthesis_engine,
synthesis_fallback=synthesis_fallback,
)
Error Handling
JSON Parsing Errors
backend/synthesis/anthropic_engine.py
try:
dossier = self._parse_response(response_text, request.person_name)
except json.JSONDecodeError as e:
logger.error("Failed to parse Claude response as JSON: {}", e)
return SynthesisResult(
person_name=request.person_name,
success=False,
error=f"Claude response was not valid JSON: {e}",
)
API Errors
backend/synthesis/anthropic_engine.py
except Exception as e:
logger.error("Anthropic synthesis failed: {}", e)
return SynthesisResult(
person_name=request.person_name,
success=False,
error=f"Synthesis error: {e}",
)
Performance Characteristics
Claude 4 Sonnet
~2-3s for typical dossier
Gemini 2.0 Flash
~1-2s (faster, less detailed)
Max Tokens
8192 (Claude) / auto (Gemini)
Timeout
30s per request
Quality Differences
| Feature | Claude 4 Sonnet | Gemini 2.0 Flash |
|---|---|---|
| Detail Level | Very high | Medium |
| Conversation Hooks | Specific, actionable | Generic |
| Risk Flags | Thorough | Basic |
| Parsing Reliability | 95%+ | 85%+ |
| Speed | Slower | Faster |
Claude 4 Sonnet is recommended for production use due to higher detail and reliability.
Next Steps
Agent Orchestration
Learn how research data is gathered
Convex Integration
See how dossiers are stored in real-time