POST /api/capture
Upload an image or video file for face detection, identification, and enrichment processing.Authentication
No authentication required (for hackathon demo).Request
Image or video file to process. Supports JPEG, PNG, MP4, and other common formats.
Source identifier for tracking. Common values:
manual_upload- Web interface uploadglasses_stream- Meta glasses cameratelegram- Telegram botapi_identify- Programmatic identification
Optional. Pre-identified person name to associate with the capture.
Response
Returns a queued capture object.Unique identifier for this capture session.
Original filename of the uploaded file.
MIME type of the uploaded file (e.g.,
image/jpeg, video/mp4).Always returns
queued on successful upload. Processing happens asynchronously.Echo of the source parameter.
Example Request
Example Response
Status Codes
- 200 - File queued for processing
- 400 - Invalid file format or missing required fields
- 413 - File too large (typically >10MB)
- 500 - Server error during upload
Processing Pipeline
After upload, the capture goes through:- Detection - MediaPipe face detection extracts face bounding boxes
- Embedding - ArcFace generates 512-dimensional face embeddings
- Identification - Face search using PimEyes and reverse image search
- Enrichment - Exa API fast-pass research
- Deep Research - Browser Use agent swarm (LinkedIn, Twitter, Google, Crunchbase)
- Synthesis - Claude/Gemini generates comprehensive dossier
Notes
- Processing is asynchronous; use WebSocket or Convex subscriptions to receive real-time updates
- Multiple faces in a single image will create separate person records
- Video files are sampled at 1fps for face detection
- Failed identifications will still create a person record with partial data
POST /api/capture/frame
Process a single frame from a live video stream (optimized for glasses/camera streaming).Authentication
No authentication required.Request
Base64-encoded JPEG image data.
Client-side timestamp in milliseconds since epoch.
Source identifier for tracking.
Set to
true when user is explicitly targeting someone for identification (e.g., center-frame focus).Response
Unique identifier for this frame.
Array of face detections in the frame.
Number of new person records created from this frame.
Echo of the request timestamp.
Echo of the source parameter.
Example Request
Example Response
Status Codes
- 200 - Frame processed successfully
- 400 - Invalid base64 data or missing required fields
- 500 - Server error during processing
Tracking Behavior
- YOLO assigns persistent
track_idvalues across frames - Same person tracked across frames shares the same
track_id - Identification is triggered once per track (not every frame)
- Setting
target=trueprioritizes that frame for identification
Performance
- Average processing time: 50-100ms per frame
- Supports 10-30 fps streaming
- Face detection is cached for 500ms per track to reduce load