System Overview
Architecture Components
1. Content Source Syncing
Background services continuously fetch content from external sources:Gmail Sync (sync_gmail.ts)
Gmail Sync (sync_gmail.ts)
Purpose: Fetch and index emails from Gmail APIProcess:Location:
- OAuth 2.0 authentication with Google
- Incremental sync using history API
- Store raw emails as markdown in
~/.rowboat/gmail_sync/ - Track sync state to avoid reprocessing
core/src/knowledge/sync_gmail.tsCalendar Sync (sync_calendar.ts)
Calendar Sync (sync_calendar.ts)
Purpose: Fetch calendar events from Google Calendar APIProcess:Location:
- OAuth 2.0 authentication with Google
- Fetch events from primary calendar
- Store as markdown in
~/.rowboat/calendar_sync/ - Track processed events by ID
core/src/knowledge/sync_calendar.tsFireflies Sync (sync_fireflies.ts)
Fireflies Sync (sync_fireflies.ts)
Purpose: Fetch meeting transcripts from Fireflies.aiProcess:Location:
- OAuth 2.0 authentication with Fireflies
- GraphQL API to fetch transcripts
- Store as markdown in
~/.rowboat/fireflies_transcripts/ - Include speaker labels and timestamps
core/src/knowledge/sync_fireflies.tsVoice Memos
Voice Memos
Purpose: User-created voice notes transcribed with WhisperProcess:Location: Created by UI, processed by graph builder
- User records audio in UI
- Audio transcribed locally using Whisper
- Stored directly in
~/.rowboat/knowledge/Voice Memos/<date>/ - Ready for entity extraction
2. Graph Builder - Entity Extraction Pipeline
The Graph Builder (build_graph.ts) is the core processing engine:
The graph builder runs every 30 seconds, checking all source folders for new/changed content. It uses mtime + content hash to detect changes efficiently.
Processing Pipeline
-
Change Detection
- Scan source folders:
gmail_sync/,fireflies_transcripts/,granola_notes/,knowledge/Voice Memos/ - Load state from
knowledge_graph_state.json - Check mtime + hash to identify changed files
- Scan source folders:
-
Batch Processing
- Group files into batches of 10
- Build knowledge index for each batch
- Create agent run for entity extraction
-
Agent Execution
- Send batch + index to
note_creationagent - Agent extracts entities (people, orgs, projects, topics)
- Agent creates/updates notes in
~/.rowboat/knowledge/
- Send batch + index to
-
State Management
- Mark processed files in state
- Commit changes to version history
- Save state after each batch
State Tracking
graph_state.ts - Change Detection
graph_state.ts - Change Detection
Purpose: Track which files have been processed to avoid reprocessingState Schema:Change Detection Logic:Location:
core/src/knowledge/graph_state.tsBatch Processing Logic
3. Knowledge Index - Entity Resolution
The Knowledge Index (knowledge_index.ts) provides entity resolution:
Index Structure
Index Building
Process:- Recursively scan
~/.rowboat/knowledge/ - Parse each markdown file
- Extract metadata based on folder:
People/→ Parse as person noteOrganizations/→ Parse as org noteProjects/→ Parse as project noteTopics/→ Parse as topic note- Other folders → Parse as generic note
Index Formatting for Agents
Prompt Format:4. Note Creation - AI Agent Workflow
The note creation agent receives batches of content and:-
Entity Extraction
- Identify people, organizations, projects, topics
- Resolve entities using the index
- Merge information if entity already exists
-
Note Creation/Updates
- Use
workspace-writeFilefor new notes - Use
workspace-editfor existing notes - Follow folder structure:
knowledge/People/,knowledge/Organizations/, etc.
- Use
-
Note Templates
- People: Name, Email, Organization, Role, Aliases
- Organizations: Name, Domain, Aliases
- Projects: Name, Status, Aliases
- Topics: Name, Keywords, Aliases
Data Flow Diagram
File System Layout
Performance Optimizations
Incremental Processing
Incremental Processing
- State tracking: Only process new/changed files
- Hash-based detection: mtime + SHA-256 hash avoids false positives
- Batch processing: Process 10 files per agent run (faster than 25)
- Partial saves: State saved after each batch (no full reprocessing on error)
Index Caching
Index Caching
- Fresh index per batch: Includes notes from previous batches
- Fast scanning: Recursive directory traversal
- Metadata extraction: Parse only frontmatter fields, not full content
Background Services
Background Services
- Async processing: All services run independently
- 30-second polling: Balance between responsiveness and CPU usage
- Error recovery: Failed batches don’t block subsequent batches
- Service logging: Structured logs for debugging
Key Design Decisions
Why batch processing? Processing files in batches (10 at a time) allows the agent to see patterns across multiple sources and merge information about the same entity.
Why index-based resolution? Providing a pre-built index prevents agents from making expensive grep/search calls and ensures consistent entity resolution.
Why mtime + hash? Checking mtime first is fast (no file read). Hash verification only happens when mtime changes, catching false positives (e.g., touch commands).
Code References
- Graph Builder:
core/src/knowledge/build_graph.ts - Knowledge Index:
core/src/knowledge/knowledge_index.ts - State Management:
core/src/knowledge/graph_state.ts - Gmail Sync:
core/src/knowledge/sync_gmail.ts - Calendar Sync:
core/src/knowledge/sync_calendar.ts - Fireflies Sync:
core/src/knowledge/sync_fireflies.ts