Performance Overview
JARVIS is designed for real-time intelligence gathering, requiring careful optimization across multiple layers:- Capture Pipeline - Fast frame extraction and face detection
- Agent Swarm - Parallel execution with timeout management
- LLM Calls - Efficient model selection and prompt engineering
- Real-time Updates - Streaming results via Convex
Architecture Optimizations
Two-Tier Research Strategy
JARVIS uses a two-tier approach to balance speed and depth:- Tier 1 provides context for Tier 2 agents (faster searches)
- Results stream to frontend as they arrive (perceived performance)
- Tier 1 completes in less than 1s, giving immediate feedback
Parallel Agent Execution
Agents run in parallel withasyncio.gather:
backend/agents/orchestrator.py
- Sequential: 60s (20s × 3 agents)
- Parallel: 20s (longest agent)
- 3x speedup in this example
Timeout Management
Prevent slow operations from blocking:LLM Performance
Model Selection
JARVIS uses different models for different tasks:| Task | Model | Reason |
|---|---|---|
| Vision (face ID) | GPT-4 Vision | Best accuracy for face recognition |
| Report synthesis | Gemini 2.0 Flash | 25x cheaper, 2x faster than GPT-4 |
| Agent prompts | Gemini 2.0 Flash | Fast, cheap, good for structured tasks |
- GPT-4: $0.03
- GPT-4 Vision: $0.01
- Gemini 2.0 Flash: $0.001
- GPT-4 Vision: ~2s
- Gemini 2.0 Flash: ~0.8s
Prompt Engineering
Optimize prompts for speed and accuracy:- Shorter prompts = fewer input tokens = faster, cheaper
- Clear structure = more consistent output = less retries
Response Streaming
For long responses, use streaming to reduce perceived latency:Database Performance
Convex Real-time Subscriptions
Convex provides real-time updates with zero overhead:frontend/app/page.tsx
- No polling overhead
- Delta-only updates (only changed data sent)
- Automatic reconnection handling
MongoDB Optimization
Optimize MongoDB queries:Capture Pipeline Performance
Frame Extraction
Optimize video frame extraction:backend/capture/frame_extractor.py
- Lower FPS = fewer frames = faster processing
- Quality 2-5 balances size and accuracy
- Use tmpdir for automatic cleanup
Face Detection
MediaPipe is optimized for speed:backend/identification/detector.py
- MediaPipe: ~5-10ms
- MTCNN: ~100ms
- dlib: ~50-100ms
Browser Automation Performance
Headless Browsers
Always use headless mode for speed:Persistent Sessions
Reuse browser sessions when possible:- Cold start: ~2-3s (launch browser)
- Warm start: ~0.5s (reuse browser)
- 4-6x speedup for multiple requests
Caching Strategies
Cache Person Lookups
Cache Exa Results
Monitoring Performance
Track Metrics
Use Laminar traces to identify bottlenecks:- Sort by duration to find slowest operations
- Filter by
tags:performance - Compare across different inputs
Performance Benchmarks
Typical JARVIS performance:| Operation | Target | Typical |
|---|---|---|
| Frame extraction (30s video) | Less than 2s | 0.8s |
| Face detection (per frame) | Less than 50ms | 10ms |
| PimEyes search | Less than 5s | 3s |
| Vision LLM extraction | Less than 2s | 1.2s |
| Tier 1 enrichment (Exa) | Less than 1s | 0.3s |
| Agent (LinkedIn) | Less than 10s | 5s |
| Agent (Twitter) | Less than 10s | 3s |
| Report synthesis | Less than 5s | 2s |
| Total (1 person) | Less than 30s | 15s |
Production Optimizations
Connection Pooling
Reuse connections to external services:Worker Processes
Scale horizontally with multiple workers:Background Tasks
Offload slow operations to background tasks:Best Practices
Profile before optimizing
Profile before optimizing
Always measure performance before optimizing:Use the Laminar dashboard to identify actual bottlenecks.
Optimize the critical path first
Optimize the critical path first
Focus on operations that block user experience:
- Capture → Face detection (user waits)
- Face detection → Identity (user waits)
- Identity → Tier 1 enrichment (user sees first results)
- Background: Tier 2 agents, synthesis
Stream results as they arrive
Stream results as they arrive
Don’t wait for all agents to complete:
Set aggressive timeouts
Set aggressive timeouts
Don’t let slow operations block the pipeline:
Use the right model for the job
Use the right model for the job
- Fast, cheap tasks: Gemini 2.0 Flash
- Vision tasks: GPT-4 Vision
- Complex reasoning: GPT-4
- Don’t use GPT-4 for everything
Troubleshooting
Slow Pipeline
- Check Laminar traces to identify bottleneck
- Verify network latency to APIs
- Check if hitting rate limits
- Profile with
cProfilefor CPU-bound operations:
High Memory Usage
-
Check for memory leaks with
objgraph: -
Limit concurrent operations:
Rate Limiting
If hitting API rate limits:-
Add exponential backoff:
- Use account pools (PimEyes)
- Cache results aggressively
Next Steps
Observability
Monitor performance with Laminar
Testing
Write performance tests
Architecture
Understand system design
Deployment
Deploy for production