Skip to main content

Introduction

JARVIS uses a sophisticated multi-agent swarm architecture to gather comprehensive intelligence about individuals in real-time. The system employs specialized agents that run in parallel, each focused on specific platforms or data sources.

Architecture

The agent system consists of two main components:

1. ResearchOrchestrator (Two-Phase)

A traditional orchestrator that runs static agents in parallel:
backend/agents/orchestrator.py
class ResearchOrchestrator:
    """Two-phase orchestrator: static agents + dynamic URL scrapers."""
    
    def __init__(self, settings: Settings, *, pool_size: int = 10):
        self._settings = settings
        self._exa = ExaEnrichmentClient(settings)
        
        # Create shared inbox pool for login-wall bypass
        self._inbox_pool: InboxPool | None = None
        if settings.agentmail_api_key:
            mail_client = AgentMailClient(api_key=settings.agentmail_api_key)
            self._inbox_pool = InboxPool(mail_client, pool_size=pool_size)
        
        # Initialize static agents with shared inbox pool
        self._static_agents: list[BaseBrowserAgent] = [
            LinkedInAgent(settings, inbox_pool=self._inbox_pool),
            TwitterAgent(settings, inbox_pool=self._inbox_pool),
            InstagramAgent(settings, inbox_pool=self._inbox_pool),
            GoogleAgent(settings, inbox_pool=self._inbox_pool),
            OsintAgent(settings, inbox_pool=self._inbox_pool),
            SocialAgent(settings, inbox_pool=self._inbox_pool),
        ]
Phase 1: Static agents (LinkedIn, Twitter, Instagram, Google, OSINT) + Exa search run concurrently
Phase 2: Dynamic URL scrapers for high-value URLs discovered by Exa/Google

2. DeepResearcher (Four-Phase)

An advanced pipeline that streams results as they complete:
backend/agents/deep_researcher.py
class DeepResearcher:
    """Multi-phase deep research pipeline that streams results.
    
    Phase 0: Exa + SixtyFour enrich-lead in parallel (~3s)
    Phase 1: Platform + OSINT skills in parallel (~20-35s, up to 15 concurrent)
    Phase 2: Deep URL extraction + SixtyFour deep-search + dark web (~30-60s)
    Phase 3: Verification loop — retry failed skills with account creation (~30-90s)
    """
1
Phase 0: Fast Initial Discovery
2
Runs Exa multi-query search immediately (yields results in ~1s), starts SixtyFour enrichment in background.
3
Phase 1: Platform Skills
4
Launches 15+ concurrent Browser Use Cloud SDK skills:
5
  • Social: TikTok, GitHub, Instagram, LinkedIn, Facebook, YouTube, Reddit, Pinterest, Linktree
  • OSINT: Background checks, SEC filings, company employees, YC data, ancestry records
  • Domain-matched: Discovers skills based on Exa/SixtyFour URLs
  • 6
    Phase 2: Deep Extraction
    7
    Navigates to uncovered URLs and runs high-impact freeform tasks:
    8
  • Court records (CourtListener, UniCourt)
  • Political donations (FEC.gov)
  • Academic papers (Google Scholar, Semantic Scholar)
  • Podcast appearances (ListenNotes)
  • Crunchbase profiles
  • Dark web / HIBP breach checks
  • 9
    Phase 3: Verification & Retry
    10
    Retries failed skills with autonomous account creation using AgentMail disposable emails.

    Parallel Execution

    Orchestrator Approach

    backend/agents/orchestrator.py
    async def research_person(self, request: ResearchRequest) -> OrchestratorResult:
        # Launch Exa + all static agents in parallel
        exa_task = asyncio.create_task(self._exa.enrich_person(exa_req), name="exa")
        
        tasks: dict[str, asyncio.Task] = {"exa": exa_task}
        for agent in self._static_agents:
            task = asyncio.create_task(agent.run(request), name=agent.agent_name)
            tasks[agent.agent_name] = task
        
        # Wait for all with the overall timeout
        done, pending = await asyncio.wait(
            tasks.values(),
            timeout=timeout,
        )
        
        # Cancel any stragglers
        for task in pending:
            task.cancel()
    

    DeepResearcher Streaming

    backend/agents/deep_researcher.py
    async def research(
        self,
        request: ResearchRequest,
    ) -> AsyncGenerator[AgentResult, None]:
        """Stream research results as they complete across all phases."""
        
        # Phase 0: Yield Exa results IMMEDIATELY (~1s)
        exa_urls, exa_snippets = await self._exa_pass(person, company, seen_urls)
        if exa_urls:
            yield AgentResult(
                agent_name="exa_deep",
                status=AgentStatus.SUCCESS,
                snippets=exa_snippets,
                urls_found=exa_urls,
            )
        
        # Phase 1: Stream skill results as they complete
        async for result in self._phase1(person, company, exa_urls, ...):
            yield result
    

    Agent Base Class

    All specialized agents inherit from BaseBrowserAgent:
    backend/agents/browser_agent.py
    class BaseBrowserAgent(ABC):
        """Abstract base class for browser-based research agents.
        
        Handles timeout enforcement, error isolation, structured logging,
        and Browser Use cloud session management with persistent auth.
        """
        
        agent_name: str = "base"
        
        @abstractmethod
        async def _run_task(self, request: ResearchRequest) -> AgentResult:
            """Subclass-specific research logic."""
            ...
        
        async def run(self, request: ResearchRequest) -> AgentResult:
            """Execute the agent with timeout and error isolation."""
            timeout = request.timeout_seconds or DEFAULT_TIMEOUT_SECONDS
            try:
                result = await asyncio.wait_for(
                    self._run_task(request),
                    timeout=timeout,
                )
                return result
            except TimeoutError:
                return AgentResult(
                    agent_name=self.agent_name,
                    status=AgentStatus.TIMEOUT,
                    error=f"Agent timed out after {timeout}s",
                )
    

    Key Features

    Shared Inbox Pool

    Pre-warmed AgentMail disposable email addresses eliminate API latency for login-wall bypass:
    backend/agents/orchestrator.py
    if settings.agentmail_api_key:
        mail_client = AgentMailClient(api_key=settings.agentmail_api_key)
        self._inbox_pool = InboxPool(mail_client, pool_size=pool_size)
        logger.info("orchestrator: inbox pool created, pool_size={}", pool_size)
    

    Concurrent Session Management

    DeepResearcher uses semaphores to limit concurrent Browser Use sessions:
    backend/agents/deep_researcher.py
    MAX_CONCURRENT_SESSIONS = 25
    
    class DeepResearcher:
        def __init__(self, settings: Settings):
            self._semaphore = asyncio.Semaphore(MAX_CONCURRENT_SESSIONS)
        
        async def _run_skill_with_semaphore(self, skill_name: str, task_str: str):
            async with self._semaphore:
                return await self._cloud.run_skill(skill_name, task_str, ...)
    

    Domain Coverage

    Domains handled by dedicated agents (no dynamic scraping needed):
    backend/agents/orchestrator.py
    COVERED_DOMAINS = frozenset({
        "linkedin.com", "www.linkedin.com",
        "twitter.com", "x.com", "www.x.com",
        "instagram.com", "www.instagram.com",
        "google.com", "www.google.com",
        "youtube.com", "www.youtube.com",
        "facebook.com", "www.facebook.com",
    })
    

    Data Models

    AgentResult

    backend/agents/models.py
    class AgentResult(BaseModel):
        """Result from a single research agent run."""
        
        agent_name: str
        status: AgentStatus = AgentStatus.PENDING
        profiles: list[SocialProfile] = Field(default_factory=list)
        snippets: list[str] = Field(default_factory=list)
        urls_found: list[str] = Field(default_factory=list)
        error: str | None = None
        confidence: float = Field(default=1.0, ge=0.0, le=1.0)
        duration_seconds: float = 0.0
    

    SocialProfile

    backend/agents/models.py
    class SocialProfile(BaseModel):
        """A social media profile discovered by an agent."""
        
        platform: str
        url: str
        username: str | None = None
        display_name: str | None = None
        bio: str | None = None
        followers: int | None = None
        following: int | None = None
        location: str | None = None
        verified: bool = False
        raw_data: dict | None = None
    

    Usage Example

    Using ResearchOrchestrator

    from agents.orchestrator import ResearchOrchestrator
    from agents.models import ResearchRequest
    from config import Settings
    
    settings = Settings()
    orchestrator = ResearchOrchestrator(settings, pool_size=10)
    
    request = ResearchRequest(
        person_name="Elon Musk",
        company="Tesla",
        timeout_seconds=90.0,
    )
    
    result = await orchestrator.research_person(request)
    print(f"Found {len(result.all_profiles)} profiles")
    print(f"Success: {result.success}")
    print(f"Duration: {result.total_duration_seconds:.1f}s")
    

    Using DeepResearcher

    from agents.deep_researcher import DeepResearcher
    from agents.models import ResearchRequest
    from config import Settings
    
    settings = Settings()
    researcher = DeepResearcher(settings)
    
    request = ResearchRequest(
        person_name="Elon Musk",
        company="Tesla",
    )
    
    # Stream results as they complete
    async for result in researcher.research(request):
        print(f"{result.agent_name}: {len(result.snippets)} snippets")
        for snippet in result.snippets:
            print(f"  - {snippet[:100]}")
    

    Performance

    Orchestrator Timings

    • Phase 1: 15-30s (6-7 static agents + Exa)
    • Phase 2: 10-20s (up to 3 dynamic scrapers)
    • Total: 25-50s typical

    DeepResearcher Timings

    • Phase 0: ~1s (Exa fast pass, immediate results)
    • Phase 1: 20-35s (15+ skills in parallel)
    • Phase 2: 30-60s (deep extraction + wow tasks)
    • Phase 3: 30-90s (verification with account creation)
    • Total: 80-180s comprehensive

    Next Steps

    Browser Use Integration

    Learn about Browser Use SDK and Cloud API integration

    LinkedIn Agent

    LinkedIn-specific agent implementation

    Twitter Agent

    Twitter/X scraping with twscrape

    Deep Researcher

    Advanced four-phase research pipeline

    Build docs developers (and LLMs) love