Supermemory Integration

Supermemory provides persistent, searchable memory for the JARVIS pipeline. When a person is researched once, their dossier is stored in Supermemory, allowing future encounters to skip expensive web agent research and retrieve cached intelligence instantly.

Why Supermemory?

Web agent research is expensive and slow:

Browser Use agents take 30-60 seconds per person
Exa API costs $0.01-0.05 per search
PimEyes reverse image search has strict rate limits

Supermemory acts as a smart cache: ✅ Store complete dossiers after first enrichment
✅ Hybrid search (semantic + keyword) finds matches even with name variations
✅ Metadata filtering by source and timestamp
✅ Automatic relevance scoring to avoid false positives

Architecture

Configuration

Get your API key from supermemory.ai/settings:

.env

SUPERMEMORY_API_KEY=sm_live_abc123...

The client automatically initializes:

backend/memory/supermemory_client.py

import os
import httpx
from loguru import logger

class SuperMemoryClient:
    def __init__(self, api_key: str | None = None):
        self._api_key = api_key or os.environ.get("SUPERMEMORY_API_KEY", "")
        if not self._api_key:
            logger.warning("SUPERMEMORY_API_KEY not set")
        
        self._client = httpx.AsyncClient(
            timeout=30,
            headers={
                "Authorization": f"Bearer {self._api_key}",
                "Content-Type": "application/json",
            },
        )

Core Operations

Store Dossier

Persist a complete person dossier to Supermemory:

backend/memory/supermemory_client.py

async def store_dossier(
    self,
    person_name: str,
    dossier_data: dict,
) -> str | None:
    """Persist a dossier to Supermemory.
    
    Returns the document ID on success, or None on failure.
    """
    content = json.dumps(
        {"person_name": person_name, "dossier": dossier_data},
        default=str,
    )
    
    payload = {
        "content": content,
        "containerTags": ["specter-dossiers"],
        "customId": f"specter-{person_name.lower().replace(' ', '-')}",
        "metadata": {
            "person_name": person_name,
            "source": "specter-pipeline",
        },
    }
    
    try:
        resp = await self._client.post(
            "https://api.supermemory.ai/v3/documents",
            json=payload
        )
        resp.raise_for_status()
        data = resp.json()
        doc_id = data.get("id")
        logger.info(f"SuperMemory store OK | person={person_name} id={doc_id}")
        return doc_id
    except Exception as exc:
        logger.error(f"SuperMemory store failed | person={person_name} err={exc}")
        return None

Key features:

Custom ID: Deterministic ID (specter-{name}) ensures re-storing overwrites old data
Container Tags: Namespace all JARVIS dossiers as specter-dossiers
Metadata: Store searchable metadata for filtering

Search Person

Look up a cached dossier by person name:

backend/memory/supermemory_client.py

async def search_person(self, name: str) -> dict | None:
    """Look up a cached dossier by person name.
    
    Returns the parsed dossier dict if a high-confidence match is found,
    otherwise None.
    """
    payload = {
        "q": name,
        "containerTag": "specter-dossiers",
        "searchMode": "hybrid",  # Semantic + keyword
        "limit": 3,
        "threshold": 0.6,  # Minimum similarity score
        "filters": {
            "AND": [
                {"key": "source", "value": "specter-pipeline"},
            ],
        },
    }
    
    try:
        resp = await self._client.post(
            "https://api.supermemory.ai/v4/search",
            json=payload
        )
        resp.raise_for_status()
        data = resp.json()
        results = data.get("results", [])
        
        if not results:
            logger.debug(f"SuperMemory search miss | name={name}")
            return None
        
        top = results[0]
        raw = top.get("memory") or top.get("chunk") or ""
        dossier = self._parse_dossier(raw, name)
        
        if dossier:
            logger.info(
                f"SuperMemory cache hit | name={name} "
                f"similarity={top.get('similarity', 0):.2f}"
            )
        
        return dossier
    except Exception as exc:
        logger.error(f"SuperMemory search failed | name={name} err={exc}")
        return None

Search features:

Hybrid search: Combines semantic embedding similarity with keyword matching
Similarity threshold: Only returns results above 0.6 score to avoid false positives
Fuzzy matching: Handles name variations (“John Smith” vs “J. Smith”)

Parse Dossier

Extract the dossier from Supermemory’s response format:

backend/memory/supermemory_client.py

def _parse_dossier(raw: str, name: str) -> dict | None:
    """Extract dossier dict from SuperMemory memory/chunk."""
    try:
        obj = json.loads(raw)
        if isinstance(obj, dict) and "dossier" in obj:
            return obj["dossier"]
        if isinstance(obj, dict):
            return obj
    except (json.JSONDecodeError, TypeError):
        pass
    
    # SuperMemory may return summarized text instead of raw JSON
    if raw and name.lower() in raw.lower():
        return {"raw_memory": raw}
    
    return None

Pipeline Integration

Supermemory is checked before running expensive web agents:

backend/orchestration/pipeline.py

from backend.memory.supermemory_client import SuperMemoryClient
from backend.agents.orchestrator import ResearchOrchestrator

async def enrich_person(
    person_name: str,
    photo_url: str,
    memory: SuperMemoryClient,
    orchestrator: ResearchOrchestrator,
) -> dict:
    # 1. Check Supermemory cache
    cached = await memory.search_person(person_name)
    if cached:
        logger.info(f"Cache hit for {person_name}, skipping web research")
        return {
            "person_name": person_name,
            "photo_url": photo_url,
            "dossier": cached,
            "source": "supermemory_cache",
        }
    
    # 2. Cache miss — run full research pipeline
    logger.info(f"Cache miss for {person_name}, starting web research")
    research_result = await orchestrator.research_person(
        person_name=person_name,
        photo_url=photo_url,
    )
    
    # 3. Store result in Supermemory for future use
    if research_result.get("dossier"):
        await memory.store_dossier(
            person_name=person_name,
            dossier_data=research_result["dossier"],
        )
    
    return research_result

Example: Complete Flow

import asyncio
from backend.memory.supermemory_client import SuperMemoryClient

async def main():
    async with SuperMemoryClient() as memory:
        # First encounter: cache miss
        print("First lookup...")
        result1 = await memory.search_person("Alice Smith")
        print(f"Result: {result1}")  # None
        
        # Store dossier
        print("Storing dossier...")
        dossier = {
            "summary": "AI researcher at OpenAI. Stanford PhD.",
            "title": "Research Scientist",
            "company": "OpenAI",
            "work_history": [
                {
                    "role": "Research Scientist",
                    "company": "OpenAI",
                    "period": "2022-present"
                }
            ],
            "social_profiles": {
                "linkedin": "https://linkedin.com/in/alicesmith",
                "github": "https://github.com/alicesmith",
            },
        }
        doc_id = await memory.store_dossier("Alice Smith", dossier)
        print(f"Stored with ID: {doc_id}")
        
        # Second encounter: cache hit
        print("Second lookup...")
        result2 = await memory.search_person("Alice Smith")
        print(f"Result: {result2['summary']}")  # Cache hit!
        
        # Fuzzy match: slight name variation
        print("Fuzzy match...")
        result3 = await memory.search_person("A. Smith")
        print(f"Result: {result3['summary'] if result3 else 'No match'}")  # May still match!

if __name__ == "__main__":
    asyncio.run(main())

Output:

First lookup...
Result: None
Storing dossier...
Stored with ID: sm_doc_xyz123
Second lookup...
Result: AI researcher at OpenAI. Stanford PhD.
Fuzzy match...
Result: AI researcher at OpenAI. Stanford PhD.

Performance Benefits

Without Supermemory

# Every person requires full research
await exa.search(person_name)           # ~2s
await browser_agents.research(urls)     # ~45s
await synthesize_dossier(fragments)     # ~5s
# Total: ~52 seconds per person

With Supermemory

# First encounter
cached = await memory.search_person(name)  # ~0.3s, miss
# ... run full research ~52s ...
await memory.store_dossier(name, dossier)  # ~0.2s

# Future encounters
cached = await memory.search_person(name)  # ~0.3s, HIT!
# Total: 0.3 seconds (99.4% faster)

Container Management

Supermemory uses container tags to namespace data:

# All JARVIS dossiers use the same tag
_CONTAINER_TAG = "specter-dossiers"

# Store with tag
payload = {
    "content": dossier_json,
    "containerTags": [_CONTAINER_TAG],
    ...
}

# Search within tag
search_payload = {
    "q": person_name,
    "containerTag": _CONTAINER_TAG,  # Only search JARVIS data
    ...
}

This prevents cross-contamination if you use Supermemory for other projects.

Error Handling

Supermemory failures are non-blocking:

async def safe_supermemory_lookup(memory: SuperMemoryClient, name: str) -> dict | None:
    try:
        return await memory.search_person(name)
    except httpx.TimeoutException:
        logger.warning(f"Supermemory timeout for {name}, proceeding without cache")
        return None
    except httpx.HTTPStatusError as exc:
        logger.error(f"Supermemory HTTP {exc.response.status_code} for {name}")
        return None
    except Exception as exc:
        logger.error(f"Supermemory unexpected error for {name}: {exc}")
        return None

If Supermemory is down, JARVIS falls back to full research without crashing.

Best Practices

Custom ID Strategy

Use deterministic custom IDs to enable idempotent updates:

def _custom_id(person_name: str) -> str:
    """Deterministic document ID so re-storing overwrites."""
    return f"specter-{person_name.strip().lower().replace(' ', '-')}"

This ensures re-enriching a person updates their dossier instead of creating duplicates.

Similarity Threshold Tuning

Adjust the similarity threshold based on your accuracy needs:

0.8+: High precision, fewer false positives (may miss slight name variations)
0.6-0.7: Balanced (recommended)
0.4-0.5: High recall, more fuzzy matches (risk of wrong person)

# Strict matching
results = await memory.search_person(name, threshold=0.85)

# Fuzzy matching
results = await memory.search_person(name, threshold=0.5)

Metadata Filtering

Use metadata filters to segment data:

# Filter by source
payload = {
    "q": name,
    "filters": {
        "AND": [
            {"key": "source", "value": "specter-pipeline"},
            {"key": "enriched_date", "operator": ">", "value": "2024-01-01"},
        ]
    }
}

This lets you invalidate old dossiers or separate dev/prod data.

Async Context Manager

Use the async context manager for proper cleanup:

async with SuperMemoryClient() as memory:
    result = await memory.search_person("Alice Smith")
    # Client automatically closes on exit

This ensures HTTP connections are properly closed even if exceptions occur.

Monitoring

Track Supermemory cache hit rates:

from collections import Counter

cache_stats = Counter()

async def enrich_with_stats(person_name: str, memory: SuperMemoryClient):
    cached = await memory.search_person(person_name)
    
    if cached:
        cache_stats["hit"] += 1
        return cached
    else:
        cache_stats["miss"] += 1
        # ... run full research ...

# Log stats periodically
logger.info(
    f"Supermemory stats: {cache_stats['hit']} hits, "
    f"{cache_stats['miss']} misses "
    f"({cache_stats['hit'] / sum(cache_stats.values()) * 100:.1f}% hit rate)"
)

API Reference

SuperMemoryClient

store_dossier

async method

Persist a dossier to SupermemoryParameters:

person_name (str): Full name of the person
dossier_data (dict): Complete dossier dictionary

Returns: Document ID (str) or None on failure

search_person

async method

Look up a cached dossier by person nameParameters:

name (str): Person name to search for

Returns: Dossier dict or None if not found

async method

Close the HTTP client connectionReturns: None

Next: Backend Architecture

Learn about the FastAPI backend orchestrating the pipeline

Get Started

Core Concepts

Hardware Integration

Backend Services

Agent System

Frontend

Data & Storage

Deployment

Supermemory Integration

Why Supermemory?

Architecture

Configuration

Core Operations

Store Dossier

Search Person

Parse Dossier

Pipeline Integration

Example: Complete Flow

Performance Benefits

Without Supermemory

With Supermemory

Container Management

Error Handling

Best Practices

Monitoring

API Reference

SuperMemoryClient

Next: Backend Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Hardware Integration

Backend Services

Agent System

Frontend

Data & Storage

Deployment

​Why Supermemory?

​Architecture

​Configuration

​Core Operations

​Store Dossier

​Search Person

​Parse Dossier

​Pipeline Integration

​Example: Complete Flow

​Performance Benefits

​Without Supermemory

​With Supermemory

​Container Management

​Error Handling

​Best Practices

​Monitoring

​API Reference

​SuperMemoryClient

Next: Backend Architecture

Build docs developers (and LLMs) love

Why Supermemory?

Architecture

Configuration

Core Operations

Store Dossier

Search Person

Parse Dossier

Pipeline Integration

Example: Complete Flow

Performance Benefits

Without Supermemory

With Supermemory

Container Management

Error Handling

Best Practices

Monitoring

API Reference

SuperMemoryClient