Observability & Monitoring

Observability Stack

JARVIS uses multiple layers of observability:

Laminar - LLM call tracing and accuracy verification
Loguru - Structured application logging
FastAPI - Built-in request/response logging
Metrics - Performance and throughput monitoring

Logging

Loguru Configuration

JARVIS uses Loguru for structured logging:

from loguru import logger

# Log at different levels
logger.debug("Processing frame {} for capture {}", frame_idx, capture_id)
logger.info("Face detected with confidence {:.2f}", confidence)
logger.warning("API key not configured for service: {}", service_name)
logger.error("Failed to process capture {}: {}", capture_id, error)

Log Levels

Configure log level via environment variable:

SPECTER_LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR, CRITICAL

Structured Logging

Loguru supports structured logging with context:

logger.bind(capture_id=capture_id, person_id=person_id).info(
    "Pipeline completed: {} faces detected", face_count
)

Log Output

Development mode shows colorized console output:

2026-03-05 14:23:45.123 | INFO     | pipeline:process:142 - Processing capture cap_abc123
2026-03-05 14:23:45.456 | INFO     | identification:detect:67 - Face detected: confidence=0.95
2026-03-05 14:23:46.789 | INFO     | agents:orchestrator:203 - Starting agent swarm for person_xyz789

Laminar Tracing

Laminar provides observability for LLM calls and agent behavior.

Setup

Get Laminar API key

Configure environment variable

LMNR_PROJECT_API_KEY=your-api-key

Initialize Laminar

Laminar is automatically initialized at app startup:

backend/observability/laminar.py

from lmnr import Laminar
from config import get_settings

def initialize_laminar(settings):
    if not settings.laminar_api_key:
        logger.warning("Laminar tracing disabled - no API key")
        return False
    
    try:
        Laminar.initialize(project_api_key=settings.laminar_api_key)
        logger.info("Laminar tracing initialized")
        return True
    except Exception as exc:
        logger.error("Failed to initialize Laminar: {}", exc)
        return False

Tracing Functions

Use the @observe decorator to trace functions:

from lmnr import observe

@observe()
async def identify_person(image_bytes: bytes) -> dict:
    """Traced: PimEyes search + Vision LLM extraction."""
    # PimEyes search
    results = await pimeyes_search(image_bytes)
    
    # Vision LLM extraction
    identity = await vision_extract(results)
    
    return identity

@observe()
async def synthesize_report(person_id: str) -> dict:
    """Traced: Report synthesis from all intel fragments."""
    fragments = await get_fragments(person_id)
    report = await llm_synthesize(fragments)
    return report

Custom Trace Decorator

JARVIS provides a custom @traced decorator that combines logging and Laminar:

backend/observability/laminar.py

from observability.laminar import traced

@traced(
    name="linkedin_research",
    metadata={"agent": "linkedin"},
    tags=["agent", "research"],
)
async def research_linkedin(person_name: str) -> dict:
    """Research person on LinkedIn."""
    # Automatically logs duration and traces to Laminar
    result = await agent.run()
    return result

The @traced decorator:

Logs start and end with duration
Sends spans to Laminar (if configured)
Handles both sync and async functions
Includes metadata and tags for filtering

Viewing Traces

Access the Laminar dashboard at app.lmnr.ai to:

View LLM call traces with full prompts and responses
Analyze token usage and costs
Debug failed requests
Track accuracy and hallucinations
Monitor agent execution flows

Example Trace

A typical pipeline trace shows:

Capture Pipeline (3.2s)
├── Frame Extraction (0.1s)
├── Face Detection (0.3s)
├── PimEyes Search (1.2s)
│   ├── Browser Navigation (0.8s)
│   └── Screenshot Capture (0.4s)
├── Vision LLM Extraction (0.9s)
│   ├── API Call: gpt-4-vision (0.8s, 1,234 tokens)
│   └── JSON Parsing (0.1s)
└── Agent Swarm (2.1s)
    ├── LinkedIn Agent (1.8s)
    ├── Twitter Agent (1.2s)
    └── Exa Enrichment (0.3s)

Performance Metrics

Built-in Metrics

The /health endpoint includes performance metrics:

curl http://localhost:8000/health

{
  "status": "ok",
  "service": "jarvis",
  "uptime_seconds": 3600,
  "services": {
    "convex": true,
    "openai": true,
    "browser_use": true
  }
}

Custom Metrics

Track custom metrics with the traced decorator:

from observability.laminar import traced
import time

@traced(
    name="face_detection",
    metadata={"model": "mediapipe"},
)
async def detect_faces(image: bytes) -> list:
    start = time.time()
    faces = await detector.detect(image)
    duration = time.time() - start
    
    logger.info(
        "Face detection completed: {} faces in {:.2f}s",
        len(faces), duration
    )
    return faces

Error Tracking

Exception Logging

Loguru automatically captures exception context:

try:
    result = await risky_operation()
except Exception as exc:
    logger.exception("Operation failed for {}", capture_id)
    # Logs full traceback automatically

Error Context

Add context to errors:

logger.bind(
    capture_id=capture_id,
    person_id=person_id,
    agent="linkedin",
).error("Agent failed: {}", error)

Sentry Integration (Optional)

For production deployments, integrate Sentry:

import sentry_sdk

sentry_sdk.init(
    dsn="your-sentry-dsn",
    environment=settings.environment,
    traces_sample_rate=1.0,
)

Debugging Tools

FastAPI Debug Mode

Enable debug mode for development:

app = FastAPI(
    title="JARVIS API",
    debug=True,  # Enable in development
)

Interactive API Docs

FastAPI provides interactive documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Use these to:

Test API endpoints interactively
View request/response schemas
Debug authentication issues

Request Logging

Log all incoming requests:

from fastapi import Request
import time

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.time()
    
    logger.info(
        "Request: {} {}",
        request.method,
        request.url.path,
    )
    
    response = await call_next(request)
    duration = time.time() - start
    
    logger.info(
        "Response: {} {} - {} in {:.2f}s",
        request.method,
        request.url.path,
        response.status_code,
        duration,
    )
    
    return response

Production Monitoring

Health Checks

Implement comprehensive health checks:

@app.get("/health")
async def health_check():
    """Comprehensive health check with service status."""
    return {
        "status": "ok",
        "service": "jarvis",
        "timestamp": time.time(),
        "services": settings.service_flags(),
    }

@app.get("/health/ready")
async def readiness_check():
    """Check if service is ready to accept requests."""
    # Check critical dependencies
    if not settings.openai_api_key:
        raise HTTPException(503, "OpenAI not configured")
    
    return {"status": "ready"}

Alerting

Set up alerts for:

API errors (>5% error rate)
High latency (>5s p95)
Service unavailability
Rate limit exhaustion

Monitoring Dashboard

Create a monitoring dashboard with:

Request metrics: throughput, latency, errors
LLM metrics: token usage, cost, latency
Agent metrics: success rate, duration
System metrics: CPU, memory, disk

Best Practices

Use structured logging

Always use structured fields instead of string interpolation:

# Good
logger.info("Processing {}", capture_id)

# Better
logger.bind(capture_id=capture_id).info("Processing capture")

# Bad
logger.info(f"Processing {capture_id}")

Trace critical paths

Use @observe or @traced on all functions that:

Make LLM API calls
Call external services
Are part of the critical path
Have significant latency

Include context in logs

Add relevant IDs to every log message:

logger.bind(
    capture_id=capture_id,
    person_id=person_id,
    frame_idx=frame_idx,
).info("Face detected")

Sample high-volume logs

For very frequent operations, sample logs:

if random.random() < 0.1:  # Log 10% of requests
    logger.debug("High-frequency operation")

Troubleshooting

Laminar Not Receiving Traces

Verify API key is set: echo $LMNR_PROJECT_API_KEY
Check initialization logs: grep "Laminar" logs.txt

Test with a simple trace:

from lmnr import observe

@observe()
def test():
    return "hello"

test()

View traces at app.lmnr.ai

Missing Log Context

If log messages are missing context:

# Wrap operations in context manager
with logger.contextualize(capture_id=capture_id):
    await process_capture()
    # All logs in this block include capture_id

Performance Impact

If observability is impacting performance:

Reduce log level in production: SPECTER_LOG_LEVEL=WARNING
Sample traces: only trace 10% of requests
Disable debug mode: debug=False in FastAPI
Use async logging (Loguru does this by default)

Next Steps

Laminar Tracing

Deep dive into Laminar integration

Performance

Optimize for production performance

Deployment

Deploy with production monitoring

Troubleshooting

Debug common issues

Setup

Advanced

Contributing

​Observability Stack

​Logging

​Loguru Configuration

​Log Levels

​Structured Logging

​Log Output

​Laminar Tracing

​Setup

​Tracing Functions

​Custom Trace Decorator

​Viewing Traces

​Example Trace

​Performance Metrics

​Built-in Metrics

​Custom Metrics

​Error Tracking

​Exception Logging

​Error Context

​Sentry Integration (Optional)

​Debugging Tools

​FastAPI Debug Mode

​Interactive API Docs

​Request Logging

​Production Monitoring

​Health Checks

​Alerting

​Monitoring Dashboard

​Best Practices

​Troubleshooting

​Laminar Not Receiving Traces

​Missing Log Context

​Performance Impact

​Next Steps

Laminar Tracing

Performance

Deployment

Troubleshooting

Build docs developers (and LLMs) love

Observability Stack

Logging

Loguru Configuration

Log Levels

Structured Logging

Log Output

Laminar Tracing

Setup

Tracing Functions

Custom Trace Decorator

Viewing Traces

Example Trace

Performance Metrics

Built-in Metrics

Custom Metrics

Error Tracking

Exception Logging

Error Context

Sentry Integration (Optional)

Debugging Tools

FastAPI Debug Mode

Interactive API Docs

Request Logging

Production Monitoring

Health Checks

Alerting

Monitoring Dashboard

Best Practices

Troubleshooting

Laminar Not Receiving Traces

Missing Log Context

Performance Impact

Next Steps