Klaus - Voice-Powered Research Assistant

What is Klaus?

Klaus is a desktop voice assistant that lets you ask questions about what you’re reading hands-free. Place a page under a camera, ask a question out loud, and Klaus reads the page and reasons about your question through Claude’s vision API before answering in natural speech. The experience is tuned for fast study loops: read, ask, clarify, continue. Klaus searches the web when it’s unsure about a claim, remembers context across turns, and can write notes directly to your Obsidian vault on request.

Quickstart

Get Klaus running in under 5 minutes with Homebrew or pipx

Requirements

Hardware setup and API keys you’ll need

Configuration

Customize hotkeys, voice models, camera settings, and more

Usage

Learn how to use voice activation, push-to-talk, and Obsidian notes

How It Works

Speak your question

Use voice-activated recording or push-to-talk (F2 by default). Speech-to-text runs locally via Moonshine Medium—no API cost, ~300ms latency.

Klaus captures the page

Your camera feed is sent to Claude’s vision API along with your transcribed question and conversation history.

Claude reasons and responds

Claude Sonnet 4.6 analyzes the page, searches the web if needed (Tavily), and generates a response.

Hear the answer

OpenAI TTS streams the response sentence-by-sentence. You hear the first sentence within 2-3 seconds.

Key Features

Vision-Grounded Q&A

Place a page under a camera and ask questions. Klaus reads the page before answering, giving you accurate, context-aware responses.

Hands-Free Operation

Voice-activated recording (WebRTC VAD) or push-to-talk. No need to touch your keyboard while studying.

Local Speech-to-Text

Speech-to-text runs entirely locally via Moonshine Medium (245M params, ~300ms latency). No transcription API costs.

Streamed Audio Responses

OpenAI TTS streams sentence-by-sentence so playback starts before the full response is generated. End-to-end latency: 2-4 seconds.

Smart Context Management

A hybrid query router classifies each question and decides what context (image, history, memory, notes) to include, optimizing both cost and relevance.

Web Search

Tavily search triggers automatically when Claude is uncertain about a claim.

Obsidian Notes

Dictate notes hands-free and Klaus writes them directly to your Obsidian vault.

Conversation Memory

SQLite-backed session history with persistent knowledge profile.

Cost & Latency

End-to-end latency from question to first spoken word: 2-4 seconds.

Usage	Approximate Cost
10 questions	~$0.05
50 questions	~$0.25
100 questions/day	~$2.50-3.50/day

The largest cost driver is Claude Sonnet 4.6 (vision + context window). You can change the reasoning model in config.toml for cheaper or more expensive options. STT is free via local Moonshine. TTS costs $0.015/min of generated audio.

Platform Support

Klaus runs on macOS and Windows with platform-specific optimizations:

AVFoundation camera names (macOS)
DWM dark title bar (Windows)
Apple Keychain for secure API key storage (macOS)
Cross-platform global hotkeys via pynput

Open Source

Klaus is MIT-licensed and available on GitHub. Contributions welcome.

Get Started

Setup & Installation

User Guide

Configuration

Architecture

Troubleshooting

Klaus - Voice-Powered Research Assistant

What is Klaus?

Quickstart

Requirements

Configuration

Usage

How It Works

Key Features

Vision-Grounded Q&A

Hands-Free Operation

Local Speech-to-Text

Streamed Audio Responses

Smart Context Management

Web Search

Obsidian Notes

Conversation Memory

Cost & Latency

Platform Support

Open Source

Build docs developers (and LLMs) love

Get Started

Setup & Installation

User Guide

Configuration

Architecture

Troubleshooting

​What is Klaus?

Quickstart

Requirements

Configuration

Usage

​How It Works

​Key Features

​Vision-Grounded Q&A

​Hands-Free Operation

​Local Speech-to-Text

​Streamed Audio Responses

​Smart Context Management

​Web Search

​Obsidian Notes

​Conversation Memory

​Cost & Latency

​Platform Support

​Open Source

Build docs developers (and LLMs) love

What is Klaus?

How It Works

Key Features

Vision-Grounded Q&A

Hands-Free Operation

Local Speech-to-Text

Streamed Audio Responses

Smart Context Management

Web Search

Obsidian Notes

Conversation Memory

Cost & Latency

Platform Support

Open Source