Overview
This document records the major architectural decisions made during JARVIS development, along with the reasoning behind each choice.ADR-001: Python FastAPI Backend
Status: Accepted Context: Need to choose a backend framework for the API and orchestration layer. Decision: Use Python with FastAPI instead of Node.js/Express or other alternatives. Rationale:- Browser Use SDK is Python - Core dependency requires Python
- ML/AI ecosystem - Better integration with face detection (mediapipe), computer vision (OpenCV), and AI libraries
- Async support - FastAPI has excellent async/await support for concurrent operations
- Type safety - Pydantic provides runtime type validation
- Developer experience - Auto-generated OpenAPI docs, interactive testing
- Node.js/Express - Would require wrapping Python Browser Use via subprocess
- Go - Limited ML/AI library support
- ✅ Native integration with Browser Use agents
- ✅ Rich ML/AI ecosystem
- ✅ Excellent async performance
- ❌ Two-language stack (Python + TypeScript)
ADR-002: Convex for Real-time Data
Status: Accepted Context: Need real-time updates to stream intelligence data to the frontend as agents complete research. Decision: Use Convex for real-time database instead of WebSockets + MongoDB Change Streams. Rationale:- Zero boilerplate - Real-time subscriptions out of the box
- Automatic reconnection - Handles network issues gracefully
- Delta updates - Only changed data is sent to clients
- TypeScript integration - Generated types for database schema
- Serverless - No infrastructure to manage
- Free tier - Sufficient for hackathon and initial development
- WebSockets + MongoDB - More complex, requires connection management
- Firebase Realtime Database - Vendor lock-in, less flexible schema
- Supabase - Good option, but less optimized for real-time
- ✅ Instant real-time updates with zero setup
- ✅ Great developer experience
- ❌ Vendor-specific (requires Convex account)
- ❌ Less control over data layer
ADR-003: Gemini 2.0 Flash for Synthesis
Status: Accepted Context: Need to choose LLM for report synthesis from multiple data sources. Decision: Use Gemini 2.0 Flash instead of GPT-4 for report synthesis. Rationale:- Cost - 25x cheaper than GPT-4 (0.03 per 1K tokens)
- Speed - 2x faster than GPT-4 (~0.8s vs ~2s)
- Quality - Sufficient for structured synthesis tasks
- Long context - 1M token context window for large documents
- GPT-4 - Better quality, but much more expensive
- Claude 3.5 Sonnet - Good middle ground, but slower than Gemini
- GPT-3.5 - Cheaper but lower quality
- ✅ Significantly lower costs
- ✅ Faster response times
- ✅ Can process more data per request
- ❌ Slightly lower quality than GPT-4 (acceptable trade-off)
ADR-004: Two-Tier Research Architecture
Status: Accepted Context: Need to balance speed (fast initial results) with depth (comprehensive research). Decision: Implement two-tier research: fast API enrichment (Tier 1) followed by deep browser research (Tier 2). Architecture:- Perceived performance - Users see results within 1 second
- Context for agents - Tier 1 data improves Tier 2 accuracy
- Fallback - If Tier 2 fails, Tier 1 results are still useful
- Streaming UX - Data appears incrementally, not all at once
- Single tier (browser only) - Too slow, no early feedback
- API-only - Not comprehensive enough
- ✅ Fast time-to-first-result
- ✅ Better user experience
- ✅ More resilient (partial results always available)
- ❌ More complex orchestration logic
ADR-005: MediaPipe for Face Detection
Status: Accepted Context: Need to detect faces in images from camera capture. Decision: Use MediaPipe Face Detection instead of dlib, MTCNN, or commercial APIs. Rationale:- Speed - 5-10ms per frame (10x faster than alternatives)
- Accuracy - 95%+ detection rate
- Easy installation - Single pip install, no compilation
- Free - No API costs
- Cross-platform - Works on Linux, macOS, Windows
- dlib - 50-100ms, complex C++ dependencies
- MTCNN - 100ms, lower accuracy
- Azure Face API - Costs, requires network call
- ✅ Fast detection enables real-time processing
- ✅ Simple deployment
- ✅ No API costs
- ❌ Slightly lower accuracy than commercial APIs (acceptable)
ADR-006: Parallel Agent Execution
Status: Accepted Context: Multiple agents need to research a person across different platforms. Decision: Run all agents in parallel usingasyncio.gather() instead of sequential execution.
Implementation:
- Performance - 3-5x speedup (20s vs 60-100s for 3 agents)
- Resilience -
return_exceptions=Trueprevents one failure from blocking others - Better UX - Results stream in as they complete
- Sequential - Too slow, blocks on failures
- Queue-based - More complex, no performance benefit
- ✅ Dramatically faster research
- ✅ Continues on individual agent failures
- ❌ Higher concurrent load on external APIs
ADR-007: Streaming Results via Convex
Status: Accepted Context: Agents complete at different times. Users should see results as they arrive. Decision: Stream individual agent results to Convex immediately, don’t wait for all agents. Implementation:- Better UX - Data appears incrementally
- Demo impact - Visually impressive to see data streaming in
- Resilience - Partial results preserved even if pipeline fails
- Batch updates - Slower, less impressive demo
- Polling - Inefficient, higher latency
- ✅ Excellent user experience
- ✅ More resilient to failures
- ✅ Great for live demos
- ❌ More Convex mutations (within free tier)
ADR-008: Laminar for Observability
Status: Accepted Context: Need to trace LLM calls and debug agent behavior. Decision: Use Laminar for LLM observability instead of generic APM tools. Rationale:- LLM-specific - Captures prompts, responses, tokens, costs
- Agent tracing - Tracks multi-step agent workflows
- Accuracy verification - Detect hallucinations
- Simple integration - Single decorator (
@observe) - Hackathon credits - $150 in free credits
- DataDog/New Relic - Generic, doesn’t capture LLM specifics
- LangSmith - Good, but Laminar has better Python SDK
- Custom logging - Too much work, no visualization
- ✅ Deep visibility into LLM behavior
- ✅ Easy debugging of agent failures
- ✅ Cost tracking
- ❌ Another service dependency
ADR-009: MongoDB for Persistent Storage
Status: Accepted Context: Need to store raw images, capture metadata, and archival person records. Decision: Use MongoDB Atlas (free tier) for persistent storage alongside Convex. Rationale:- Document-oriented - Natural fit for person records (varying schemas)
- GridFS - Built-in support for storing large files (images)
- Free tier - 512MB storage, sufficient for development
- Convex complement - Convex for real-time, MongoDB for archival
- PostgreSQL - Relational model less suitable for varying schemas
- S3 + PostgreSQL - More complex, requires managing two services
- Only Convex - No good solution for large binary storage
- ✅ Flexible schema for person records
- ✅ Built-in binary storage
- ✅ Free tier sufficient
- ❌ Two database systems to manage
ADR-010: Next.js for Frontend
Status: Accepted Context: Need to build an interactive corkboard UI with real-time updates. Decision: Use Next.js 14 with App Router instead of Create React App or other frameworks. Rationale:- React Server Components - Better performance
- Built-in routing - No need for React Router
- Vercel deployment - One-click deploy, free hosting
- TypeScript - First-class TypeScript support
- Convex integration - Excellent React hooks
- Create React App - Deprecated, no SSR
- Remix - Good, but less mature ecosystem
- Svelte/SvelteKit - Smaller ecosystem
- ✅ Fast development with great DX
- ✅ Excellent performance
- ✅ Easy deployment
- ❌ Learning curve for App Router
ADR-011: Timeouts for All Agents
Status: Accepted Context: Browser agents can hang or take arbitrarily long. Decision: Set aggressive timeouts (3 minutes max) on all agent operations. Implementation:- Bounded latency - Guarantee response within 3 minutes
- Better UX - Users don’t wait forever
- Resource efficiency - Don’t waste compute on stuck operations
- Partial results - Streaming means we have data even on timeout
- No timeout - Risk of hung operations
- Per-agent timeout - More complex, less predictable
- ✅ Predictable performance
- ✅ Better resource utilization
- ❌ May miss some data if agent is slow
ADR-012: Loguru for Logging
Status: Accepted Context: Need structured logging for debugging and monitoring. Decision: Use Loguru instead of standard library logging. Rationale:- Better DX - Simpler API than stdlib logging
- Structured logging - Native support for
.bind() - Colored output - Easier to read during development
- Async-safe - Works well with FastAPI
- Exception capture -
logger.exception()includes full traceback
- Standard library logging - More boilerplate, less intuitive
- structlog - Good, but more complex setup
- ✅ Excellent developer experience
- ✅ Rich context in logs
- ✅ Easy debugging
- ❌ Non-standard (but popular) library
ADR-013: Ruff for Code Quality
Status: Accepted Context: Need linting and formatting for Python code. Decision: Use Ruff for both linting and formatting instead of Black + flake8 + isort. Rationale:- Speed - 10-100x faster than alternatives (written in Rust)
- All-in-one - Replaces multiple tools
- Compatible - Black-compatible formatting
- Modern - Supports latest Python features
- Black + flake8 + isort - Slower, multiple tools
- Pylint - Slower, more opinionated
- ✅ Fast linting/formatting
- ✅ Single tool to learn
- ✅ Great performance in CI
- ❌ Relatively new (but rapidly maturing)
ADR-014: Browser Use Cloud Sessions
Status: Accepted Context: Social platforms require authentication. Managing auth in headless browsers is complex. Decision: Use Browser Use Cloud with persistent profile for authenticated sessions. Setup:- Persistent auth - Login once, reuse across agents
- No credential management - Browser handles cookies/tokens
- Anti-detection - Real browser fingerprint
- Hackathon credits - $100 free credits
- Local Selenium - Complex auth management
- API keys - Not available for all platforms
- Manual login per run - Too slow
- ✅ Reliable authenticated sessions
- ✅ No credential storage in code
- ✅ Better anti-bot evasion
- ❌ Requires Browser Use Cloud account
- ❌ Sessions shared across all agents
Summary
These architectural decisions shaped JARVIS into:- Fast - Two-tier architecture, parallel agents, streaming results
- Reliable - Timeouts, graceful degradation, comprehensive logging
- Observable - Laminar tracing, structured logging
- Developer-friendly - Type safety, great tooling, clear patterns
Future ADRs
Potential future decisions to document:- Caching strategy (Redis vs in-memory)
- Multi-region deployment
- Rate limiting implementation
- Background job processing (Celery vs native async)
Contributing
When making significant architectural changes:- Create a new ADR in this document
- Follow the format:
- Status (Proposed/Accepted/Deprecated)
- Context
- Decision
- Rationale
- Alternatives Considered
- Consequences
- Include in PR description
- Discuss in PR review
Architecture Overview
Understand the overall system architecture
Contributing
Guidelines for contributing to JARVIS
Performance
Performance optimization techniques
Local Setup
Set up your development environment