Skip to main content
Voice Commands provide quick, context-aware actions you can trigger through voice input, complementing the full Voice Agent experience.

Overview

While the Voice Agent provides conversational AI interaction, Voice Commands offer instant execution of predefined actions through simple voice triggers.

Global Voice Shortcuts

Primary Voice Controls

Voice Agent

Ctrl+Alt+JLaunch the full Voice Agent for conversational AI interaction

Voice Transcription

Ctrl+Alt+TToggle voice-to-text transcription mode

Cycle Transcribe Modes

Ctrl+Shift+TSwitch between paste, typewriter, and buffer modes

Stop Auto-typing

Ctrl+Shift+XImmediately halt any ongoing AI typing

Built-in Voice Actions

These actions are available when the Voice Agent is active:

System Commands

Voice trigger: “Change the theme to [light/dark]”Switches the Tabby interface between light and dark mode.
{
  name: 'changeBrowserTheme',
  type: 'function',
  description: 'Change the browser theme',
  parameters: {
    type: 'object',
    properties: {
      theme: {
        type: 'string',
        enum: ['light', 'dark']
      }
    },
    required: ['theme']
  }
}
Voice trigger: “End conversation” / “Hang up” / “Goodbye”Gracefully ends the voice session and closes the agent.
{
  name: 'endConversation',
  type: 'function',
  description: 'End the current voice conversation',
  parameters: {
    type: 'object',
    properties: {},
    required: []
  }
}

Desktop Automation Commands

When Windows MCP is enabled, these voice commands are available:
Open Applications
voice
“Open Chrome” / “Launch VS Code” / “Start Notepad”Uses Powershell-Tool to launch applications quickly.
Switch Applications
voice
“Switch to Chrome” / “Focus on VS Code”Uses App-Tool to activate running applications.
Close Applications
voice
“Close Chrome” / “Quit Notepad”Uses process management to terminate applications.

Workflow Automation

Create custom voice-triggered workflows by training the Voice Agent:
1

Describe Your Workflow

Tell the agent about your routine:“Every morning I open Slack, Gmail in Chrome, and VS Code with my project folder.”
2

Agent Stores as Memory

The agent saves this as a procedural memory with type PROCEDURAL.
3

Trigger Workflow

Next time, just say:“Prep my workspace” / “Start my day” / “Set up my environment”The agent executes the stored workflow automatically.

Example Workflows

User: "When I say 'dev mode', open VS Code in my projects folder, 
       start the terminal, and open Chrome with localhost:3000"

[Agent stores workflow]

User: "Dev mode"
[Agent executes all steps]

Memory-Based Commands

The Voice Agent remembers context across sessions:
Example: After telling the agent your name is Sarah:User: “What’s my name?”Agent: “Your name is Sarah.”The agent searches memories first before responding to any query.
Example: After expressing preference:User: “I prefer dark mode”[Agent stores preference]Later: “Set my theme”[Agent applies dark mode based on stored preference]
Example: Sharing your tech stack:User: “I work with React, TypeScript, and Tailwind CSS”[Agent stores technical preferences]Later: “Create a new component”[Agent generates React + TypeScript + Tailwind component]

Creating Custom Commands

You can extend voice commands by adding custom tools to the MCP server or creating new functions in the Voice Agent API.

Custom Tool Template

{
  type: 'function',
  name: 'customAction',
  description: 'Description of what this action does',
  parameters: {
    type: 'object',
    properties: {
      param1: {
        type: 'string',
        description: 'Description of parameter'
      }
    },
    required: ['param1']
  }
}
Add your tool to DEFAULT_VOICE_TOOLS array in /lib/ai/voice/index.ts.

Tool Execution Handler

Implement the execution logic in the Voice Agent hook:
case "customAction":
  // Your custom logic here
  console.log(toolArgs.param1);
  // Perform action
  break;

Command Best Practices

For reliable voice command recognition:
  1. Be specific: “Open Chrome” vs “Open browser”
  2. Use natural language: “What’s the weather?” vs “Weather query”
  3. Avoid ambiguity: “Open my main project” (after teaching the agent which project)
  4. Build context: Let the agent learn your preferences over time
  5. Confirm actions: The agent will acknowledge and execute

Keyboard Shortcut Reference

Voice Agent

Ctrl+Alt+J

Transcription

Ctrl+Alt+T

Cycle Modes

Ctrl+Shift+T

Stop Typing

Ctrl+Shift+X

Actions Menu

Ctrl+\

Brain Panel

Ctrl+Shift+B

Troubleshooting

  • Check microphone permissions in browser settings
  • Verify microphone is selected as default input device
  • Test microphone in browser console: navigator.mediaDevices.getUserMedia({ audio: true })
  • Reduce background noise
  • Ensure Windows MCP server is running (for desktop automation)
  • Check API keys are properly configured
  • Verify memory backend is accessible at localhost:8000
  • Review browser console for error messages
  • Confirm memory backend is running
  • Check Supabase connection for vector storage
  • Verify Neo4j connection (if using knowledge graph)
  • Try explicitly saying “Remember that” after sharing information

Requirements

Voice Commands require:
  • OpenAI API key (for Voice Agent)
  • Groq API key (for Transcription)
  • Microphone permissions
  • (Optional) Windows MCP server for desktop automation
  • (Optional) Memory backend for persistent context

Voice Agent

Full conversational AI voice assistant

Voice Transcription

Speech-to-text for typing in any app

Action Menu

Quick AI actions triggered by keyboard

Brain Panel

Memory dashboard and management

Build docs developers (and LLMs) love