Agent System Overview

Introduction

JARVIS uses a sophisticated multi-agent swarm architecture to gather comprehensive intelligence about individuals in real-time. The system employs specialized agents that run in parallel, each focused on specific platforms or data sources.

Architecture

The agent system consists of two main components:

1. ResearchOrchestrator (Two-Phase)

A traditional orchestrator that runs static agents in parallel:

backend/agents/orchestrator.py

class ResearchOrchestrator:
    """Two-phase orchestrator: static agents + dynamic URL scrapers."""
    
    def __init__(self, settings: Settings, *, pool_size: int = 10):
        self._settings = settings
        self._exa = ExaEnrichmentClient(settings)
        
        # Create shared inbox pool for login-wall bypass
        self._inbox_pool: InboxPool | None = None
        if settings.agentmail_api_key:
            mail_client = AgentMailClient(api_key=settings.agentmail_api_key)
            self._inbox_pool = InboxPool(mail_client, pool_size=pool_size)
        
        # Initialize static agents with shared inbox pool
        self._static_agents: list[BaseBrowserAgent] = [
            LinkedInAgent(settings, inbox_pool=self._inbox_pool),
            TwitterAgent(settings, inbox_pool=self._inbox_pool),
            InstagramAgent(settings, inbox_pool=self._inbox_pool),
            GoogleAgent(settings, inbox_pool=self._inbox_pool),
            OsintAgent(settings, inbox_pool=self._inbox_pool),
            SocialAgent(settings, inbox_pool=self._inbox_pool),
        ]

Phase 1: Static agents (LinkedIn, Twitter, Instagram, Google, OSINT) + Exa search run concurrently
Phase 2: Dynamic URL scrapers for high-value URLs discovered by Exa/Google

2. DeepResearcher (Four-Phase)

An advanced pipeline that streams results as they complete:

backend/agents/deep_researcher.py

class DeepResearcher:
    """Multi-phase deep research pipeline that streams results.
    
    Phase 0: Exa + SixtyFour enrich-lead in parallel (~3s)
    Phase 1: Platform + OSINT skills in parallel (~20-35s, up to 15 concurrent)
    Phase 2: Deep URL extraction + SixtyFour deep-search + dark web (~30-60s)
    Phase 3: Verification loop — retry failed skills with account creation (~30-90s)
    """

Phase 0: Fast Initial Discovery

Runs Exa multi-query search immediately (yields results in ~1s), starts SixtyFour enrichment in background.

Phase 1: Platform Skills

Launches 15+ concurrent Browser Use Cloud SDK skills:

Social: TikTok, GitHub, Instagram, LinkedIn, Facebook, YouTube, Reddit, Pinterest, Linktree

OSINT: Background checks, SEC filings, company employees, YC data, ancestry records

Domain-matched: Discovers skills based on Exa/SixtyFour URLs

Phase 2: Deep Extraction

Navigates to uncovered URLs and runs high-impact freeform tasks:

Court records (CourtListener, UniCourt)

Political donations (FEC.gov)

Academic papers (Google Scholar, Semantic Scholar)

Podcast appearances (ListenNotes)

Crunchbase profiles

Dark web / HIBP breach checks

Phase 3: Verification & Retry

Retries failed skills with autonomous account creation using AgentMail disposable emails.

Parallel Execution

Orchestrator Approach

backend/agents/orchestrator.py

async def research_person(self, request: ResearchRequest) -> OrchestratorResult:
    # Launch Exa + all static agents in parallel
    exa_task = asyncio.create_task(self._exa.enrich_person(exa_req), name="exa")
    
    tasks: dict[str, asyncio.Task] = {"exa": exa_task}
    for agent in self._static_agents:
        task = asyncio.create_task(agent.run(request), name=agent.agent_name)
        tasks[agent.agent_name] = task
    
    # Wait for all with the overall timeout
    done, pending = await asyncio.wait(
        tasks.values(),
        timeout=timeout,
    )
    
    # Cancel any stragglers
    for task in pending:
        task.cancel()

DeepResearcher Streaming

backend/agents/deep_researcher.py

async def research(
    self,
    request: ResearchRequest,
) -> AsyncGenerator[AgentResult, None]:
    """Stream research results as they complete across all phases."""
    
    # Phase 0: Yield Exa results IMMEDIATELY (~1s)
    exa_urls, exa_snippets = await self._exa_pass(person, company, seen_urls)
    if exa_urls:
        yield AgentResult(
            agent_name="exa_deep",
            status=AgentStatus.SUCCESS,
            snippets=exa_snippets,
            urls_found=exa_urls,
        )
    
    # Phase 1: Stream skill results as they complete
    async for result in self._phase1(person, company, exa_urls, ...):
        yield result

Agent Base Class

All specialized agents inherit from BaseBrowserAgent:

backend/agents/browser_agent.py

class BaseBrowserAgent(ABC):
    """Abstract base class for browser-based research agents.
    
    Handles timeout enforcement, error isolation, structured logging,
    and Browser Use cloud session management with persistent auth.
    """
    
    agent_name: str = "base"
    
    @abstractmethod
    async def _run_task(self, request: ResearchRequest) -> AgentResult:
        """Subclass-specific research logic."""
        ...
    
    async def run(self, request: ResearchRequest) -> AgentResult:
        """Execute the agent with timeout and error isolation."""
        timeout = request.timeout_seconds or DEFAULT_TIMEOUT_SECONDS
        try:
            result = await asyncio.wait_for(
                self._run_task(request),
                timeout=timeout,
            )
            return result
        except TimeoutError:
            return AgentResult(
                agent_name=self.agent_name,
                status=AgentStatus.TIMEOUT,
                error=f"Agent timed out after {timeout}s",
            )

Key Features

Shared Inbox Pool

Pre-warmed AgentMail disposable email addresses eliminate API latency for login-wall bypass:

backend/agents/orchestrator.py

if settings.agentmail_api_key:
    mail_client = AgentMailClient(api_key=settings.agentmail_api_key)
    self._inbox_pool = InboxPool(mail_client, pool_size=pool_size)
    logger.info("orchestrator: inbox pool created, pool_size={}", pool_size)

Concurrent Session Management

DeepResearcher uses semaphores to limit concurrent Browser Use sessions:

backend/agents/deep_researcher.py

MAX_CONCURRENT_SESSIONS = 25

class DeepResearcher:
    def __init__(self, settings: Settings):
        self._semaphore = asyncio.Semaphore(MAX_CONCURRENT_SESSIONS)
    
    async def _run_skill_with_semaphore(self, skill_name: str, task_str: str):
        async with self._semaphore:
            return await self._cloud.run_skill(skill_name, task_str, ...)

Domain Coverage

Domains handled by dedicated agents (no dynamic scraping needed):

backend/agents/orchestrator.py

COVERED_DOMAINS = frozenset({
    "linkedin.com", "www.linkedin.com",
    "twitter.com", "x.com", "www.x.com",
    "instagram.com", "www.instagram.com",
    "google.com", "www.google.com",
    "youtube.com", "www.youtube.com",
    "facebook.com", "www.facebook.com",
})

Data Models

AgentResult

backend/agents/models.py

class AgentResult(BaseModel):
    """Result from a single research agent run."""
    
    agent_name: str
    status: AgentStatus = AgentStatus.PENDING
    profiles: list[SocialProfile] = Field(default_factory=list)
    snippets: list[str] = Field(default_factory=list)
    urls_found: list[str] = Field(default_factory=list)
    error: str | None = None
    confidence: float = Field(default=1.0, ge=0.0, le=1.0)
    duration_seconds: float = 0.0

SocialProfile

backend/agents/models.py

class SocialProfile(BaseModel):
    """A social media profile discovered by an agent."""
    
    platform: str
    url: str
    username: str | None = None
    display_name: str | None = None
    bio: str | None = None
    followers: int | None = None
    following: int | None = None
    location: str | None = None
    verified: bool = False
    raw_data: dict | None = None

Usage Example

Using ResearchOrchestrator

from agents.orchestrator import ResearchOrchestrator
from agents.models import ResearchRequest
from config import Settings

settings = Settings()
orchestrator = ResearchOrchestrator(settings, pool_size=10)

request = ResearchRequest(
    person_name="Elon Musk",
    company="Tesla",
    timeout_seconds=90.0,
)

result = await orchestrator.research_person(request)
print(f"Found {len(result.all_profiles)} profiles")
print(f"Success: {result.success}")
print(f"Duration: {result.total_duration_seconds:.1f}s")

Using DeepResearcher

from agents.deep_researcher import DeepResearcher
from agents.models import ResearchRequest
from config import Settings

settings = Settings()
researcher = DeepResearcher(settings)

request = ResearchRequest(
    person_name="Elon Musk",
    company="Tesla",
)

# Stream results as they complete
async for result in researcher.research(request):
    print(f"{result.agent_name}: {len(result.snippets)} snippets")
    for snippet in result.snippets:
        print(f"  - {snippet[:100]}")

Performance

Orchestrator Timings

Phase 1: 15-30s (6-7 static agents + Exa)
Phase 2: 10-20s (up to 3 dynamic scrapers)
Total: 25-50s typical

DeepResearcher Timings

Phase 0: ~1s (Exa fast pass, immediate results)
Phase 1: 20-35s (15+ skills in parallel)
Phase 2: 30-60s (deep extraction + wow tasks)
Phase 3: 30-90s (verification with account creation)
Total: 80-180s comprehensive

Next Steps

Browser Use Integration

Learn about Browser Use SDK and Cloud API integration

LinkedIn Agent

LinkedIn-specific agent implementation

Twitter Agent

Twitter/X scraping with twscrape

Deep Researcher

Advanced four-phase research pipeline

Get Started

Core Concepts

Hardware Integration

Backend Services

Agent System

Frontend

Data & Storage

Deployment

Agent System Overview

Introduction

Architecture

1. ResearchOrchestrator (Two-Phase)

2. DeepResearcher (Four-Phase)

Parallel Execution

Orchestrator Approach

DeepResearcher Streaming

Agent Base Class

Key Features

Shared Inbox Pool

Concurrent Session Management

Domain Coverage

Data Models

AgentResult

SocialProfile

Usage Example

Using ResearchOrchestrator

Using DeepResearcher

Performance

Orchestrator Timings

DeepResearcher Timings

Next Steps

Browser Use Integration

LinkedIn Agent

Twitter Agent

Deep Researcher

Build docs developers (and LLMs) love

Get Started

Core Concepts

Hardware Integration

Backend Services

Agent System

Frontend

Data & Storage

Deployment

​Introduction

​Architecture

​1. ResearchOrchestrator (Two-Phase)

​2. DeepResearcher (Four-Phase)

​Parallel Execution

​Orchestrator Approach

​DeepResearcher Streaming

​Agent Base Class

​Key Features

​Shared Inbox Pool

​Concurrent Session Management

​Domain Coverage

​Data Models

​AgentResult

​SocialProfile

​Usage Example

​Using ResearchOrchestrator

​Using DeepResearcher

​Performance

​Orchestrator Timings

​DeepResearcher Timings

​Next Steps

Browser Use Integration

LinkedIn Agent

Twitter Agent

Deep Researcher

Build docs developers (and LLMs) love

Introduction

Architecture

1. ResearchOrchestrator (Two-Phase)

2. DeepResearcher (Four-Phase)

Parallel Execution

Orchestrator Approach

DeepResearcher Streaming

Agent Base Class

Key Features

Shared Inbox Pool

Concurrent Session Management

Domain Coverage

Data Models

AgentResult

SocialProfile

Usage Example

Using ResearchOrchestrator

Using DeepResearcher

Performance

Orchestrator Timings

DeepResearcher Timings

Next Steps