Skip to main content

What is Klaus?

Klaus is a desktop voice assistant that lets you ask questions about what you’re reading hands-free. Place a page under a camera, ask a question out loud, and Klaus reads the page and reasons about your question through Claude’s vision API before answering in natural speech. The experience is tuned for fast study loops: read, ask, clarify, continue. Klaus searches the web when it’s unsure about a claim, remembers context across turns, and can write notes directly to your Obsidian vault on request.

Quickstart

Get Klaus running in under 5 minutes with Homebrew or pipx

Requirements

Hardware setup and API keys you’ll need

Configuration

Customize hotkeys, voice models, camera settings, and more

Usage

Learn how to use voice activation, push-to-talk, and Obsidian notes

How It Works

1

Speak your question

Use voice-activated recording or push-to-talk (F2 by default). Speech-to-text runs locally via Moonshine Medium—no API cost, ~300ms latency.
2

Klaus captures the page

Your camera feed is sent to Claude’s vision API along with your transcribed question and conversation history.
3

Claude reasons and responds

Claude Sonnet 4.6 analyzes the page, searches the web if needed (Tavily), and generates a response.
4

Hear the answer

OpenAI TTS streams the response sentence-by-sentence. You hear the first sentence within 2-3 seconds.

Key Features

Vision-Grounded Q&A

Place a page under a camera and ask questions. Klaus reads the page before answering, giving you accurate, context-aware responses.

Hands-Free Operation

Voice-activated recording (WebRTC VAD) or push-to-talk. No need to touch your keyboard while studying.

Local Speech-to-Text

Speech-to-text runs entirely locally via Moonshine Medium (245M params, ~300ms latency). No transcription API costs.

Streamed Audio Responses

OpenAI TTS streams sentence-by-sentence so playback starts before the full response is generated. End-to-end latency: 2-4 seconds.

Smart Context Management

A hybrid query router classifies each question and decides what context (image, history, memory, notes) to include, optimizing both cost and relevance. Tavily search triggers automatically when Claude is uncertain about a claim.

Obsidian Notes

Dictate notes hands-free and Klaus writes them directly to your Obsidian vault.

Conversation Memory

SQLite-backed session history with persistent knowledge profile.

Cost & Latency

End-to-end latency from question to first spoken word: 2-4 seconds.
UsageApproximate Cost
10 questions~$0.05
50 questions~$0.25
100 questions/day~$2.50-3.50/day
The largest cost driver is Claude Sonnet 4.6 (vision + context window). You can change the reasoning model in config.toml for cheaper or more expensive options. STT is free via local Moonshine. TTS costs $0.015/min of generated audio.

Platform Support

Klaus runs on macOS and Windows with platform-specific optimizations:
  • AVFoundation camera names (macOS)
  • DWM dark title bar (Windows)
  • Apple Keychain for secure API key storage (macOS)
  • Cross-platform global hotkeys via pynput

Open Source

Klaus is MIT-licensed and available on GitHub. Contributions welcome.

Build docs developers (and LLMs) love