Skip to main content

Overview

This document records the major architectural decisions made during JARVIS development, along with the reasoning behind each choice.

ADR-001: Python FastAPI Backend

Status: Accepted Context: Need to choose a backend framework for the API and orchestration layer. Decision: Use Python with FastAPI instead of Node.js/Express or other alternatives. Rationale:
  • Browser Use SDK is Python - Core dependency requires Python
  • ML/AI ecosystem - Better integration with face detection (mediapipe), computer vision (OpenCV), and AI libraries
  • Async support - FastAPI has excellent async/await support for concurrent operations
  • Type safety - Pydantic provides runtime type validation
  • Developer experience - Auto-generated OpenAPI docs, interactive testing
Alternatives Considered:
  • Node.js/Express - Would require wrapping Python Browser Use via subprocess
  • Go - Limited ML/AI library support
Consequences:
  • ✅ Native integration with Browser Use agents
  • ✅ Rich ML/AI ecosystem
  • ✅ Excellent async performance
  • ❌ Two-language stack (Python + TypeScript)

ADR-002: Convex for Real-time Data

Status: Accepted Context: Need real-time updates to stream intelligence data to the frontend as agents complete research. Decision: Use Convex for real-time database instead of WebSockets + MongoDB Change Streams. Rationale:
  • Zero boilerplate - Real-time subscriptions out of the box
  • Automatic reconnection - Handles network issues gracefully
  • Delta updates - Only changed data is sent to clients
  • TypeScript integration - Generated types for database schema
  • Serverless - No infrastructure to manage
  • Free tier - Sufficient for hackathon and initial development
Alternatives Considered:
  • WebSockets + MongoDB - More complex, requires connection management
  • Firebase Realtime Database - Vendor lock-in, less flexible schema
  • Supabase - Good option, but less optimized for real-time
Consequences:
  • ✅ Instant real-time updates with zero setup
  • ✅ Great developer experience
  • ❌ Vendor-specific (requires Convex account)
  • ❌ Less control over data layer

ADR-003: Gemini 2.0 Flash for Synthesis

Status: Accepted Context: Need to choose LLM for report synthesis from multiple data sources. Decision: Use Gemini 2.0 Flash instead of GPT-4 for report synthesis. Rationale:
  • Cost - 25x cheaper than GPT-4 (0.001vs0.001 vs 0.03 per 1K tokens)
  • Speed - 2x faster than GPT-4 (~0.8s vs ~2s)
  • Quality - Sufficient for structured synthesis tasks
  • Long context - 1M token context window for large documents
Alternatives Considered:
  • GPT-4 - Better quality, but much more expensive
  • Claude 3.5 Sonnet - Good middle ground, but slower than Gemini
  • GPT-3.5 - Cheaper but lower quality
Consequences:
  • ✅ Significantly lower costs
  • ✅ Faster response times
  • ✅ Can process more data per request
  • ❌ Slightly lower quality than GPT-4 (acceptable trade-off)

ADR-004: Two-Tier Research Architecture

Status: Accepted Context: Need to balance speed (fast initial results) with depth (comprehensive research). Decision: Implement two-tier research: fast API enrichment (Tier 1) followed by deep browser research (Tier 2). Architecture:
Tier 1 (< 1s): Exa API → LinkedIn URL, company, title

Tier 2 (10-60s): Browser Use agents → Full profiles
Rationale:
  • Perceived performance - Users see results within 1 second
  • Context for agents - Tier 1 data improves Tier 2 accuracy
  • Fallback - If Tier 2 fails, Tier 1 results are still useful
  • Streaming UX - Data appears incrementally, not all at once
Alternatives Considered:
  • Single tier (browser only) - Too slow, no early feedback
  • API-only - Not comprehensive enough
Consequences:
  • ✅ Fast time-to-first-result
  • ✅ Better user experience
  • ✅ More resilient (partial results always available)
  • ❌ More complex orchestration logic

ADR-005: MediaPipe for Face Detection

Status: Accepted Context: Need to detect faces in images from camera capture. Decision: Use MediaPipe Face Detection instead of dlib, MTCNN, or commercial APIs. Rationale:
  • Speed - 5-10ms per frame (10x faster than alternatives)
  • Accuracy - 95%+ detection rate
  • Easy installation - Single pip install, no compilation
  • Free - No API costs
  • Cross-platform - Works on Linux, macOS, Windows
Alternatives Considered:
  • dlib - 50-100ms, complex C++ dependencies
  • MTCNN - 100ms, lower accuracy
  • Azure Face API - Costs, requires network call
Consequences:
  • ✅ Fast detection enables real-time processing
  • ✅ Simple deployment
  • ✅ No API costs
  • ❌ Slightly lower accuracy than commercial APIs (acceptable)

ADR-006: Parallel Agent Execution

Status: Accepted Context: Multiple agents need to research a person across different platforms. Decision: Run all agents in parallel using asyncio.gather() instead of sequential execution. Implementation:
results = await asyncio.gather(
    linkedin_agent.run(),
    twitter_agent.run(),
    instagram_agent.run(),
    return_exceptions=True  # Continue on failure
)
Rationale:
  • Performance - 3-5x speedup (20s vs 60-100s for 3 agents)
  • Resilience - return_exceptions=True prevents one failure from blocking others
  • Better UX - Results stream in as they complete
Alternatives Considered:
  • Sequential - Too slow, blocks on failures
  • Queue-based - More complex, no performance benefit
Consequences:
  • ✅ Dramatically faster research
  • ✅ Continues on individual agent failures
  • ❌ Higher concurrent load on external APIs

ADR-007: Streaming Results via Convex

Status: Accepted Context: Agents complete at different times. Users should see results as they arrive. Decision: Stream individual agent results to Convex immediately, don’t wait for all agents. Implementation:
async def _run_agent(self, agent):
    result = await agent.run()
    # Stream immediately
    await self.convex.mutation("intelFragments:create", {
        "personId": self.person_id,
        "source": agent.SOURCE_NAME,
        "content": result,
    })
    return result
Rationale:
  • Better UX - Data appears incrementally
  • Demo impact - Visually impressive to see data streaming in
  • Resilience - Partial results preserved even if pipeline fails
Alternatives Considered:
  • Batch updates - Slower, less impressive demo
  • Polling - Inefficient, higher latency
Consequences:
  • ✅ Excellent user experience
  • ✅ More resilient to failures
  • ✅ Great for live demos
  • ❌ More Convex mutations (within free tier)

ADR-008: Laminar for Observability

Status: Accepted Context: Need to trace LLM calls and debug agent behavior. Decision: Use Laminar for LLM observability instead of generic APM tools. Rationale:
  • LLM-specific - Captures prompts, responses, tokens, costs
  • Agent tracing - Tracks multi-step agent workflows
  • Accuracy verification - Detect hallucinations
  • Simple integration - Single decorator (@observe)
  • Hackathon credits - $150 in free credits
Alternatives Considered:
  • DataDog/New Relic - Generic, doesn’t capture LLM specifics
  • LangSmith - Good, but Laminar has better Python SDK
  • Custom logging - Too much work, no visualization
Consequences:
  • ✅ Deep visibility into LLM behavior
  • ✅ Easy debugging of agent failures
  • ✅ Cost tracking
  • ❌ Another service dependency

ADR-009: MongoDB for Persistent Storage

Status: Accepted Context: Need to store raw images, capture metadata, and archival person records. Decision: Use MongoDB Atlas (free tier) for persistent storage alongside Convex. Rationale:
  • Document-oriented - Natural fit for person records (varying schemas)
  • GridFS - Built-in support for storing large files (images)
  • Free tier - 512MB storage, sufficient for development
  • Convex complement - Convex for real-time, MongoDB for archival
Alternatives Considered:
  • PostgreSQL - Relational model less suitable for varying schemas
  • S3 + PostgreSQL - More complex, requires managing two services
  • Only Convex - No good solution for large binary storage
Consequences:
  • ✅ Flexible schema for person records
  • ✅ Built-in binary storage
  • ✅ Free tier sufficient
  • ❌ Two database systems to manage

ADR-010: Next.js for Frontend

Status: Accepted Context: Need to build an interactive corkboard UI with real-time updates. Decision: Use Next.js 14 with App Router instead of Create React App or other frameworks. Rationale:
  • React Server Components - Better performance
  • Built-in routing - No need for React Router
  • Vercel deployment - One-click deploy, free hosting
  • TypeScript - First-class TypeScript support
  • Convex integration - Excellent React hooks
Alternatives Considered:
  • Create React App - Deprecated, no SSR
  • Remix - Good, but less mature ecosystem
  • Svelte/SvelteKit - Smaller ecosystem
Consequences:
  • ✅ Fast development with great DX
  • ✅ Excellent performance
  • ✅ Easy deployment
  • ❌ Learning curve for App Router

ADR-011: Timeouts for All Agents

Status: Accepted Context: Browser agents can hang or take arbitrarily long. Decision: Set aggressive timeouts (3 minutes max) on all agent operations. Implementation:
try:
    results = await asyncio.wait_for(
        asyncio.gather(*agents),
        timeout=180  # 3 minutes
    )
except asyncio.TimeoutError:
    # Return partial results
    pass
Rationale:
  • Bounded latency - Guarantee response within 3 minutes
  • Better UX - Users don’t wait forever
  • Resource efficiency - Don’t waste compute on stuck operations
  • Partial results - Streaming means we have data even on timeout
Alternatives Considered:
  • No timeout - Risk of hung operations
  • Per-agent timeout - More complex, less predictable
Consequences:
  • ✅ Predictable performance
  • ✅ Better resource utilization
  • ❌ May miss some data if agent is slow

ADR-012: Loguru for Logging

Status: Accepted Context: Need structured logging for debugging and monitoring. Decision: Use Loguru instead of standard library logging. Rationale:
  • Better DX - Simpler API than stdlib logging
  • Structured logging - Native support for .bind()
  • Colored output - Easier to read during development
  • Async-safe - Works well with FastAPI
  • Exception capture - logger.exception() includes full traceback
Alternatives Considered:
  • Standard library logging - More boilerplate, less intuitive
  • structlog - Good, but more complex setup
Consequences:
  • ✅ Excellent developer experience
  • ✅ Rich context in logs
  • ✅ Easy debugging
  • ❌ Non-standard (but popular) library

ADR-013: Ruff for Code Quality

Status: Accepted Context: Need linting and formatting for Python code. Decision: Use Ruff for both linting and formatting instead of Black + flake8 + isort. Rationale:
  • Speed - 10-100x faster than alternatives (written in Rust)
  • All-in-one - Replaces multiple tools
  • Compatible - Black-compatible formatting
  • Modern - Supports latest Python features
Alternatives Considered:
  • Black + flake8 + isort - Slower, multiple tools
  • Pylint - Slower, more opinionated
Consequences:
  • ✅ Fast linting/formatting
  • ✅ Single tool to learn
  • ✅ Great performance in CI
  • ❌ Relatively new (but rapidly maturing)

ADR-014: Browser Use Cloud Sessions

Status: Accepted Context: Social platforms require authentication. Managing auth in headless browsers is complex. Decision: Use Browser Use Cloud with persistent profile for authenticated sessions. Setup:
BROWSER_USE_API_KEY=your-key
BROWSER_USE_PROFILE_ID=your-profile

# One-time: Login interactively
browseruse login
# Sessions persist in cloud profile
Rationale:
  • Persistent auth - Login once, reuse across agents
  • No credential management - Browser handles cookies/tokens
  • Anti-detection - Real browser fingerprint
  • Hackathon credits - $100 free credits
Alternatives Considered:
  • Local Selenium - Complex auth management
  • API keys - Not available for all platforms
  • Manual login per run - Too slow
Consequences:
  • ✅ Reliable authenticated sessions
  • ✅ No credential storage in code
  • ✅ Better anti-bot evasion
  • ❌ Requires Browser Use Cloud account
  • ❌ Sessions shared across all agents

Summary

These architectural decisions shaped JARVIS into:
  • Fast - Two-tier architecture, parallel agents, streaming results
  • Reliable - Timeouts, graceful degradation, comprehensive logging
  • Observable - Laminar tracing, structured logging
  • Developer-friendly - Type safety, great tooling, clear patterns

Future ADRs

Potential future decisions to document:
  • Caching strategy (Redis vs in-memory)
  • Multi-region deployment
  • Rate limiting implementation
  • Background job processing (Celery vs native async)

Contributing

When making significant architectural changes:
  1. Create a new ADR in this document
  2. Follow the format:
    • Status (Proposed/Accepted/Deprecated)
    • Context
    • Decision
    • Rationale
    • Alternatives Considered
    • Consequences
  3. Include in PR description
  4. Discuss in PR review

Architecture Overview

Understand the overall system architecture

Contributing

Guidelines for contributing to JARVIS

Performance

Performance optimization techniques

Local Setup

Set up your development environment

Build docs developers (and LLMs) love