Capture API

POST /api/capture

Upload an image or video file for face detection, identification, and enrichment processing.

Authentication

No authentication required (for hackathon demo).

Request

file

required

Image or video file to process. Supports JPEG, PNG, MP4, and other common formats.

source

string

default:"manual_upload"

Source identifier for tracking. Common values:

manual_upload - Web interface upload
glasses_stream - Meta glasses camera
telegram - Telegram bot
api_identify - Programmatic identification

person_name

string

Optional. Pre-identified person name to associate with the capture.

Response

Returns a queued capture object.

capture_id

string

Unique identifier for this capture session.

filename

string

Original filename of the uploaded file.

content_type

string

MIME type of the uploaded file (e.g., image/jpeg, video/mp4).

status

string

Always returns queued on successful upload. Processing happens asynchronously.

source

string

Echo of the source parameter.

Example Request

curl -X POST https://api.jarvis.local/api/capture \
  -F "[email protected]" \
  -F "source=manual_upload" \
  -F "person_name=John Smith"

Example Response

{
  "capture_id": "cap_a1b2c3d4e5f6",
  "filename": "photo.jpg",
  "content_type": "image/jpeg",
  "status": "queued",
  "source": "manual_upload"
}

Status Codes

200 - File queued for processing
400 - Invalid file format or missing required fields
413 - File too large (typically >10MB)
500 - Server error during upload

Processing Pipeline

After upload, the capture goes through:

Detection - MediaPipe face detection extracts face bounding boxes
Embedding - ArcFace generates 512-dimensional face embeddings
Identification - Face search using PimEyes and reverse image search
Enrichment - Exa API fast-pass research
Deep Research - Browser Use agent swarm (LinkedIn, Twitter, Google, Crunchbase)
Synthesis - Claude/Gemini generates comprehensive dossier

Notes

Processing is asynchronous; use WebSocket or Convex subscriptions to receive real-time updates
Multiple faces in a single image will create separate person records
Video files are sampled at 1fps for face detection
Failed identifications will still create a person record with partial data

POST /api/capture/frame

Process a single frame from a live video stream (optimized for glasses/camera streaming).

Authentication

No authentication required.

Request

frame

string

required

Base64-encoded JPEG image data.

timestamp

integer

required

Client-side timestamp in milliseconds since epoch.

source

string

default:"glasses_stream"

Source identifier for tracking.

target

boolean

default:false

Set to true when user is explicitly targeting someone for identification (e.g., center-frame focus).

Response

capture_id

string

Unique identifier for this frame.

detections

array

Array of face detections in the frame.

Show Detection object

bbox

array

Bounding box coordinates [x1, y1, x2, y2] in pixels.

confidence

number

Detection confidence score (0.0 to 1.0).

track_id

integer

YOLO tracking ID for persistent face tracking across frames.

new_persons

integer

Number of new person records created from this frame.

timestamp

integer

Echo of the request timestamp.

source

string

Echo of the source parameter.

Example Request

curl -X POST https://api.jarvis.local/api/capture/frame \
  -H "Content-Type: application/json" \
  -d '{
    "frame": "/9j/4AAQSkZJRgABAQEA...",
    "timestamp": 1709654400000,
    "source": "glasses_stream",
    "target": false
  }'

Example Response

{
  "capture_id": "frame_abc123xyz",
  "detections": [
    {
      "bbox": [120, 80, 280, 240],
      "confidence": 0.94,
      "track_id": 1
    },
    {
      "bbox": [400, 100, 560, 260],
      "confidence": 0.88,
      "track_id": 2
    }
  ],
  "new_persons": 1,
  "timestamp": 1709654400000,
  "source": "glasses_stream"
}

Status Codes

200 - Frame processed successfully
400 - Invalid base64 data or missing required fields
500 - Server error during processing

Tracking Behavior

YOLO assigns persistent track_id values across frames
Same person tracked across frames shares the same track_id
Identification is triggered once per track (not every frame)
Setting target=true prioritizes that frame for identification

Performance

Average processing time: 50-100ms per frame
Supports 10-30 fps streaming
Face detection is cached for 500ms per track to reduce load

Endpoints

Convex Functions

POST /api/capture

Authentication

Request

Response

Example Request

Example Response

Status Codes

Processing Pipeline

Notes

POST /api/capture/frame

Authentication

Request

Response

Example Request

Example Response

Status Codes

Tracking Behavior

Performance

Build docs developers (and LLMs) love

Endpoints

Convex Functions

​POST /api/capture

​Authentication

​Request

​Response

​Example Request

​Example Response

​Status Codes

​Processing Pipeline

​Notes

​POST /api/capture/frame

​Authentication

​Request

​Response

​Example Request

​Example Response

​Status Codes

​Tracking Behavior

​Performance

Build docs developers (and LLMs) love

POST /api/capture

Authentication

Request

Response

Example Request

Example Response

Status Codes

Processing Pipeline

Notes

POST /api/capture/frame

Authentication

Request

Response

Example Request

Example Response

Status Codes

Tracking Behavior

Performance