Skip to main content
Voice mode enables you to control vimGPT using spoken objectives instead of typing. This feature uses Whisper, OpenAI’s speech recognition model, to transcribe your voice commands.

Enabling voice mode

To run vimGPT with voice input:
python main.py --voice
The --voice flag enables voice input mode (main.py:44-46).

How it works

When voice mode is enabled:
  1. Initialization: vimGPT starts the browser and navigates to Google
  2. Voice capture: The system listens for your voice command (main.py:17-25)
  3. Transcription: Whisper converts your speech to text
  4. Execution: vimGPT uses the transcribed text as the objective
  5. Autonomous browsing: The agent performs actions until the task is complete

Voice input workflow

Here’s what you’ll see when running in voice mode:
Initializing the Vimbot driver...
Navigating to Google...
Voice mode enabled. Listening for your command...
At this point, speak your objective clearly into your microphone:
"Search for the best restaurants in New York"
Once captured, you’ll see:
Objective received: Search for the best restaurants in New York
Capturing the screen...
Getting actions for the given objective...

Implementation details

Voice mode uses the whisper-mic library for audio capture and transcription:
from whisper_mic import WhisperMic

if voice_mode:
    print("Voice mode enabled. Listening for your command...")
    mic = WhisperMic()
    try:
        objective = mic.listen()
    except Exception as e:
        print(f"Error in capturing voice input: {e}")
        return  # Exit if voice input fails
    print(f"Objective received: {objective}")
Reference: main.py:17-25

Requirements

Voice mode requires the whisper-mic package, which is included in the requirements (requirements.txt:20):
whisper-mic
Install it with:
pip install -r requirements.txt
The whisper-mic library handles microphone access and Whisper model integration automatically. You don’t need to configure the Whisper model separately.

Error handling

If voice capture fails, vimGPT will display an error message and exit:
Error in capturing voice input: [error details]
The program gracefully exits instead of continuing with an empty objective (main.py:23-24).

Tips for voice mode

Speak clearly and naturally: Whisper performs best with clear articulation at a normal speaking pace.
Use a quality microphone: Built-in laptop microphones work, but a dedicated microphone improves accuracy.
Minimize background noise: Find a quiet environment for better transcription accuracy.
Be specific: Just like text mode, specific objectives work better than vague ones.

Comparison with text mode

FeatureText modeVoice mode
Objective inputType in terminalSpeak into microphone
Commandpython main.pypython main.py --voice
Best forQuick testing, scriptingHands-free operation, accessibility
RequirementsNone (default)whisper-mic package

Example voice commands

Here are some example voice commands to try:
  • “Search YouTube for GPT-4 tutorials”
  • “Find news articles about climate change”
  • “Navigate to GitHub and search for AI projects”
  • “Look up Italian restaurants near me”
  • “Find the Python documentation”
Voice mode requires microphone access. Ensure your system permissions allow Python to access your microphone.

Accessibility benefits

Voice mode makes vimGPT more accessible by:
  • Enabling hands-free browsing: Control the browser without typing
  • Supporting voice-first workflows: Integrate with voice-based systems
  • Reducing physical interaction: Helpful for users with mobility limitations

Switching between modes

You can easily switch between text and voice modes:
# Text mode (default)
python main.py

# Voice mode
python main.py --voice
No additional configuration is needed—just add or remove the --voice flag.

Troubleshooting

Microphone not detected

If your microphone isn’t detected:
  1. Check system audio settings
  2. Verify microphone permissions for Python
  3. Test the microphone with other applications
  4. Ensure whisper-mic is properly installed

Poor transcription accuracy

  • Speak more slowly and clearly
  • Reduce background noise
  • Use a better quality microphone
  • Check microphone positioning and distance

Voice capture timeout

If the system doesn’t capture your voice:
  • Check if you need to start speaking immediately
  • Verify the microphone is active and unmuted
  • Look at whisper-mic documentation for timeout settings

Build docs developers (and LLMs) love