Knowledge System Architecture

Rowboat’s knowledge system transforms raw content (emails, meetings, notes) into a structured knowledge graph using AI-powered entity extraction and note creation.

System Overview

Content Sources → Graph Builder → Knowledge Index → Entity Notes
     ↓                 ↓                ↓               ↓
  Gmail            Batch              Index          Obsidian
  Calendar        Processing          Search          Notes
  Fireflies      AI Agents          Metadata        [[Links]]
  Voice Memos    Extraction         Aliases         Relations

Architecture Components

1. Content Source Syncing

Background services continuously fetch content from external sources:

Gmail Sync (sync_gmail.ts)

Purpose: Fetch and index emails from Gmail APIProcess:

OAuth 2.0 authentication with Google
Incremental sync using history API
Store raw emails as markdown in ~/.rowboat/gmail_sync/
Track sync state to avoid reprocessing

File Format:

# Email: Subject Line

**From:** [email protected]
**To:** [email protected]
**Date:** 2026-02-28

Email body content...

Location: core/src/knowledge/sync_gmail.ts

Calendar Sync (sync_calendar.ts)

Purpose: Fetch calendar events from Google Calendar APIProcess:

OAuth 2.0 authentication with Google
Fetch events from primary calendar
Store as markdown in ~/.rowboat/calendar_sync/
Track processed events by ID

File Format:

# Meeting Title

**Date:** 2026-02-28
**Attendees:** [email protected], [email protected]

Meeting description...

Location: core/src/knowledge/sync_calendar.ts

Fireflies Sync (sync_fireflies.ts)

Purpose: Fetch meeting transcripts from Fireflies.aiProcess:

OAuth 2.0 authentication with Fireflies
GraphQL API to fetch transcripts
Store as markdown in ~/.rowboat/fireflies_transcripts/
Include speaker labels and timestamps

File Format:

# Meeting: Title

**Date:** 2026-02-28
**Duration:** 45 minutes

## Transcript

**Speaker 1:** [00:00] Opening remarks...
**Speaker 2:** [01:30] Response...

Location: core/src/knowledge/sync_fireflies.ts

Voice Memos

Purpose: User-created voice notes transcribed with WhisperProcess:

User records audio in UI
Audio transcribed locally using Whisper
Stored directly in ~/.rowboat/knowledge/Voice Memos/<date>/
Ready for entity extraction

File Format:

# Voice Memo - Feb 28, 2026

**Recorded:** 2026-02-28 14:30:00

Transcribed content...

Location: Created by UI, processed by graph builder

2. Graph Builder - Entity Extraction Pipeline

The Graph Builder (build_graph.ts) is the core processing engine:

The graph builder runs every 30 seconds, checking all source folders for new/changed content. It uses mtime + content hash to detect changes efficiently.

Processing Pipeline

// Main processing loop
while (true) {
  await processAllSources();
  await sleep(30_000); // Check every 30 seconds
}

Step-by-Step Flow:

Change Detection
- Scan source folders: gmail_sync/, fireflies_transcripts/, granola_notes/, knowledge/Voice Memos/
- Load state from knowledge_graph_state.json
- Check mtime + hash to identify changed files
Batch Processing
- Group files into batches of 10
- Build knowledge index for each batch
- Create agent run for entity extraction
Agent Execution
- Send batch + index to note_creation agent
- Agent extracts entities (people, orgs, projects, topics)
- Agent creates/updates notes in ~/.rowboat/knowledge/
State Management
- Mark processed files in state
- Commit changes to version history
- Save state after each batch

State Tracking

graph_state.ts - Change Detection

Purpose: Track which files have been processed to avoid reprocessingState Schema:

interface GraphState {
  processedFiles: Record<string, FileState>;
  lastBuildTime: string;
}

interface FileState {
  mtime: string;        // File modification time
  hash: string;         // SHA-256 content hash
  lastProcessed: string; // When it was processed
}

Change Detection Logic:

function hasFileChanged(filePath: string, state: GraphState): boolean {
  const fileState = state.processedFiles[filePath];
  
  // New file - never processed
  if (!fileState) return true;
  
  // Quick check: mtime unchanged = no change
  if (currentMtime === fileState.mtime) return false;
  
  // Mtime changed: verify with content hash
  return currentHash !== fileState.hash;
}

Location: core/src/knowledge/graph_state.ts

Batch Processing Logic

// build_graph.ts (simplified)
async function buildGraphWithFiles(files: string[], state: GraphState) {
  const BATCH_SIZE = 10;
  const batches = chunk(files, BATCH_SIZE);
  
  for (const batch of batches) {
    // Build fresh index for each batch
    const index = buildKnowledgeIndex();
    const indexForPrompt = formatIndexForPrompt(index);
    
    // Create agent run with batch + index
    const result = await createNotesFromBatch(
      batch, 
      batchNumber, 
      indexForPrompt
    );
    
    // Mark files as processed
    for (const file of batch) {
      markFileAsProcessed(file.path, state);
    }
    
    // Save state after each batch (partial progress)
    saveState(state);
  }
}

3. Knowledge Index - Entity Resolution

The Knowledge Index (knowledge_index.ts) provides entity resolution:

Agents must NOT use grep/search to find existing notes. The index is the source of truth for entity resolution.

Index Structure

interface KnowledgeIndex {
  people: PersonEntry[];          // Person notes
  organizations: OrganizationEntry[]; // Company notes
  projects: ProjectEntry[];       // Project notes
  topics: TopicEntry[];           // Topic notes
  other: OtherEntry[];            // Other folders
  buildTime: string;
}

interface PersonEntry {
  file: string;         // Relative path: "People/John Doe.md"
  name: string;         // "John Doe"
  email?: string;       // "[email protected]"
  aliases: string[];    // ["JD", "John"]
  organization?: string; // "Acme Corp"
  role?: string;        // "CEO"
}

Index Building

Process:

Recursively scan ~/.rowboat/knowledge/
Parse each markdown file
Extract metadata based on folder:
- People/ → Parse as person note
- Organizations/ → Parse as org note
- Projects/ → Parse as project note
- Topics/ → Parse as topic note
- Other folders → Parse as generic note

Metadata Extraction:

function parsePersonNote(filePath: string, content: string): PersonEntry {
  return {
    file: relativePath(filePath),
    name: extractTitle(content),      // First H1
    email: extractField(content, 'Email'),
    aliases: extractList(content, 'Aliases'),
    organization: extractField(content, 'Organization'),
    role: extractField(content, 'Role'),
  };
}

Index Formatting for Agents

Prompt Format:

# Existing Knowledge Base Index

Built at: 2026-02-28T14:30:00Z

## People

| File | Name | Email | Organization | Aliases |
|------|------|-------|--------------|--------|
| People/John Doe.md | John Doe | [email protected] | Acme Corp | JD, John |
| People/Jane Smith.md | Jane Smith | [email protected] | - | Jane |

## Organizations

| File | Name | Domain | Aliases |
|------|------|--------|--------|
| Organizations/Acme Corp.md | Acme Corp | acme.com | ACME |

## Projects

| File | Name | Status | Aliases |
|------|------|--------|--------|
| Projects/Rowboat.md | Rowboat | Active | RB |

Usage in Agent Prompt:

const message = `
Process the following source files and create/update notes.

**Instructions:**
- Use the KNOWLEDGE BASE INDEX below to resolve entities
- DO NOT grep/search for existing notes
- Extract entities from ALL files
- Create or update notes in "knowledge" directory

---

${indexForPrompt}

---

# Source Files to Process

## Source File 1: email-2026-02-28.md
...
`;

4. Note Creation - AI Agent Workflow

The note creation agent receives batches of content and:

Entity Extraction
- Identify people, organizations, projects, topics
- Resolve entities using the index
- Merge information if entity already exists
Note Creation/Updates
- Use workspace-writeFile for new notes
- Use workspace-edit for existing notes
- Follow folder structure: knowledge/People/, knowledge/Organizations/, etc.
Note Templates
- People: Name, Email, Organization, Role, Aliases
- Organizations: Name, Domain, Aliases
- Projects: Name, Status, Aliases
- Topics: Name, Keywords, Aliases

Example Person Note:

# John Doe

**Email:** [email protected]
**Organization:** [[Acme Corp]]
**Role:** CEO
**Aliases:** JD, John

## Context

Met with John on 2026-02-28 to discuss [[Rowboat]] project...

## Related
- [[Acme Corp]]
- [[Rowboat]]

Data Flow Diagram

┌─────────────┐
│   Gmail     │───┐
└─────────────┘   │
                  │
┌─────────────┐   │     ┌──────────────┐
│  Calendar   │───┼────→│ Graph Builder│
└─────────────┘   │     └──────┬───────┘
                  │            │
┌─────────────┐   │            │ Batch Processing
│ Fireflies   │───┤            │ + AI Extraction
└─────────────┘   │            ↓
                  │     ┌──────────────┐
┌─────────────┐   │     │   Knowledge  │
│Voice Memos  │───┘     │     Index    │
└─────────────┘         └──────┬───────┘
                               │
                               │ Entity Resolution
                               ↓
                        ┌──────────────┐
                        │   Notes      │
                        │  People/     │
                        │  Orgs/       │
                        │  Projects/   │
                        └──────────────┘

File System Layout

~/.rowboat/
├── gmail_sync/              # Raw Gmail content
│   ├── 2026-02-28-email1.md
│   └── 2026-02-28-email2.md
├── calendar_sync/           # Calendar events
│   └── 2026-02-28-meeting.md
├── fireflies_transcripts/   # Meeting transcripts
│   └── 2026-02-28-standup.md
├── knowledge/               # Structured notes (OUTPUT)
│   ├── People/
│   │   ├── John Doe.md
│   │   └── Jane Smith.md
│   ├── Organizations/
│   │   └── Acme Corp.md
│   ├── Projects/
│   │   └── Rowboat.md
│   ├── Topics/
│   │   └── Product Strategy.md
│   └── Voice Memos/
│       └── 2026-02-28/
│           └── voice-memo-143000.md
├── knowledge_graph_state.json  # Processing state
└── config/
    └── models.json          # LLM configuration

Performance Optimizations

Incremental Processing

State tracking: Only process new/changed files
Hash-based detection: mtime + SHA-256 hash avoids false positives
Batch processing: Process 10 files per agent run (faster than 25)
Partial saves: State saved after each batch (no full reprocessing on error)

Index Caching

Fresh index per batch: Includes notes from previous batches
Fast scanning: Recursive directory traversal
Metadata extraction: Parse only frontmatter fields, not full content

Background Services

Async processing: All services run independently
30-second polling: Balance between responsiveness and CPU usage
Error recovery: Failed batches don’t block subsequent batches
Service logging: Structured logs for debugging

Key Design Decisions

Why batch processing? Processing files in batches (10 at a time) allows the agent to see patterns across multiple sources and merge information about the same entity.

Why index-based resolution? Providing a pre-built index prevents agents from making expensive grep/search calls and ensures consistent entity resolution.

Why mtime + hash? Checking mtime first is fast (no file read). Hash verification only happens when mtime changes, catching false positives (e.g., touch commands).

Code References

Graph Builder: core/src/knowledge/build_graph.ts
Knowledge Index: core/src/knowledge/knowledge_index.ts
State Management: core/src/knowledge/graph_state.ts
Gmail Sync: core/src/knowledge/sync_gmail.ts
Calendar Sync: core/src/knowledge/sync_calendar.ts
Fireflies Sync: core/src/knowledge/sync_fireflies.ts

Contributing

Architecture

Reference

Knowledge System Architecture

System Overview

Architecture Components

1. Content Source Syncing

2. Graph Builder - Entity Extraction Pipeline

Processing Pipeline

State Tracking

Batch Processing Logic

3. Knowledge Index - Entity Resolution

Index Structure

Index Building

Index Formatting for Agents

4. Note Creation - AI Agent Workflow

Data Flow Diagram

File System Layout

Performance Optimizations

Key Design Decisions

Code References

Build docs developers (and LLMs) love

Contributing

Architecture

Reference

​System Overview

​Architecture Components

​1. Content Source Syncing

​2. Graph Builder - Entity Extraction Pipeline

​Processing Pipeline

​State Tracking

​Batch Processing Logic

​3. Knowledge Index - Entity Resolution

​Index Structure

​Index Building

​Index Formatting for Agents

​4. Note Creation - AI Agent Workflow

​Data Flow Diagram

​File System Layout

​Performance Optimizations

​Key Design Decisions

​Code References

Build docs developers (and LLMs) love

System Overview

Architecture Components

1. Content Source Syncing

2. Graph Builder - Entity Extraction Pipeline

Processing Pipeline

State Tracking

Batch Processing Logic

3. Knowledge Index - Entity Resolution

Index Structure

Index Building

Index Formatting for Agents

4. Note Creation - AI Agent Workflow

Data Flow Diagram

File System Layout

Performance Optimizations

Key Design Decisions

Code References