When to Use Meeting Mode
Meeting mode is designed for:- Video calls and meetings (Zoom, Teams, Google Meet)
- Lectures and presentations
- Interviews and conversations
- Brainstorming sessions
- Podcasts and recordings
Quick Start
Commands
Starting a Meeting
--title flag is optional. Without it, meetings are named by date and time (e.g., “Meeting 2026-02-16 14:30”).
Requirements: The daemon must be running and meeting mode must be enabled in config.
Stopping a Meeting
Pausing and Resuming
Checking Status
Listing Past Meetings
Viewing Meeting Details
latest as shorthand for the most recent meeting’s ID.
Exporting Transcripts
| Format | Flag | Description |
|---|---|---|
| Markdown | markdown or md | Readable with headers and speaker labels |
| Plain text | text or txt | Just the words, no formatting |
| JSON | json | Structured data with all segment metadata |
| SRT | srt | SubRip subtitle format |
| VTT | vtt | WebVTT subtitle format |
| Flag | Description |
|---|---|
--format, -f | Output format (default: markdown) |
--output, -o | Write to file instead of stdout |
--timestamps | Include timestamps in output |
--speakers | Include speaker labels |
--metadata | Include metadata header (title, date, duration) |
Labeling Speakers
When diarization detects multiple speakers, they’re assigned IDs likeSPEAKER_00, SPEAKER_01. Replace these with real names:
AI Summarization
Generate a summary with key points, action items, and decisions:- Brief overview of the meeting
- Key discussion points
- Action items (with assignees when mentioned)
- Decisions made
Deleting Meetings
--force flag is required to confirm deletion.
How It Works
Chunked Transcription
Meeting mode splits continuous audio into fixed-duration chunks (default: 30 seconds) and transcribes each chunk as it becomes ready. This approach:- Reduces memory usage: No need to buffer entire meeting in RAM
- Provides progress: See partial results as the meeting progresses
- Enables real-time monitoring: Check status during the meeting
- Improves reliability: A chunk failure doesn’t lose the entire recording
Speaker Attribution
Two diarization backends are available: Simple (default): Uses audio source (mic vs loopback) to distinguish “You” from “Remote” speakers. No ML model required. ML (optional): Uses ONNX-based speaker embeddings to identify individual speakers. Requires theml-diarization feature and a downloaded model.
For most users (1:1 calls, single remote participant), simple diarization is sufficient.
Storage
Meetings are stored at~/.local/share/voxtype/meetings/ (or your configured path):
Configuration
All meeting settings live under[meeting] in ~/.config/voxtype/config.toml.
Basic Settings
Audio Settings
loopback_device = "auto" captures system audio (the other side of a call). When active, speaker attribution can distinguish between “You” (mic) and “Remote” (system audio).
Set loopback_device = "disabled" if you only want your microphone.
Diarization Settings
Speaker diarization identifies who said what:- simple: Uses audio source (mic vs loopback). No ML model needed.
- ml: Uses ONNX embeddings to identify individual speakers. Requires
ml-diarizationfeature. - subprocess: Same as ML but runs in separate process for memory isolation.
Summarization Settings
- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull llama3.2 - Set
backend = "local"in config - Run
voxtype meeting summarize latest
Use Cases
Recording a Zoom Call
Transcribing a Lecture
Interview Transcription
Tips for Best Results
Choose the Right Model
Meeting transcription processes many chunks, so model choice affects both speed and accuracy:- Fast hardware:
large-v3-turbowith GPU for best accuracy - CPU only:
base.enorsmall.enfor English - Slower hardware:
tiny.enkeeps up with real-time audio
Use a Good Microphone
Transcription accuracy depends heavily on audio quality. A dedicated microphone or headset works much better than a laptop’s built-in mic.Set Chunk Duration Appropriately
Default 30 seconds works well for most cases:- Shorter chunks (15-20s): Faster partial results, more processing overhead
- Longer chunks (45-60s): Better accuracy on slower hardware, more context per transcription
Label Speakers After the Meeting
Runvoxtype meeting list to find the meeting ID, then use voxtype meeting label to assign names to auto-detected speaker IDs. This makes the exported transcript much more readable.
Export in Multiple Formats
You can export the same meeting in different formats:- Markdown: For reading and sharing
- JSON: For processing in other tools
- SRT/VTT: For adding subtitles to a video recording
Troubleshooting
Meeting mode not available
Error: “Meeting mode is not enabled” Solution: Add to config and restart:No remote audio captured
Problem: Only your voice is transcribed, not the other side of the call. Solutions:-
Enable loopback device:
-
Check PipeWire/PulseAudio monitors:
-
Manually specify loopback device:
Transcription can’t keep up
Problem: Chunks take longer to transcribe than real-time. Solutions:-
Use a smaller model:
- Enable GPU acceleration (see GPU Acceleration)
-
Increase chunk duration:
Speaker attribution not working
Problem: All speech attributed to same speaker. Solutions:-
Ensure diarization is enabled:
-
For simple backend, ensure loopback is configured:
-
For ML backend, ensure model is downloaded and feature is enabled:
Further Reading
- Full Meeting Mode Documentation
- Transcription Engines - Choose the right engine
- Configuration guide - All meeting mode settings