Common issues
API key not found or authentication errors
API key not found or authentication errors
vimGPT requires an OpenAI API key to function. The error typically appears as:Solution:
- Create a
.envfile in the project root directory - Add your OpenAI API key:
- Ensure the
.envfile is in the same directory where you runmain.py - Verify the key is valid by checking your OpenAI dashboard
vision.py:11 module uses python-dotenv to automatically load environment variables from the .env file.Vimium extension not loading
Vimium extension not loading
If you see errors about missing Vimium or the browser launches without yellow hint markers:Solution:
-
Run the setup script to download Vimium:
-
Verify the
vimium-masterdirectory exists in your project root: -
If the directory exists but errors persist, check Playwright permissions:
-
Manual installation alternative:
vimbot.py:7 and must match the downloaded directory name.GPT-4V not detecting elements / returning empty actions
GPT-4V not detecting elements / returning empty actions
The model fails to identify clickable elements and returns
{} or incorrect actions.Common causes:- Image resolution too low for element detection
- Vimium hints not visible in screenshot
- Page not fully loaded before capture
-
Increase image resolution (edit
vision.py:12):Note: Higher resolution increases token usage and API costs. -
Add delay before capture (edit
main.py:31): -
Verify Vimium activation by checking that yellow hints appear:
- Set
headless=Falseinvimbot.py:11to see the browser - Manually press
fto confirm Vimium works
- Set
-
Check screenshot quality by saving captures:
Invalid JSON response errors
Invalid JSON response errors
GPT-4V returns text that cannot be parsed as JSON:How it’s handled:vimGPT includes automatic JSON repair (
vision.py:52). When the first parse fails:- The malformed response is sent to GPT-4o for cleaning
- A second parse attempt is made
- If both fail, an empty dict
{}is returned
- Check the printed JSON response in your terminal
- Common issues:
- Response wrapped in markdown code blocks:
```json {...} ``` - Extra explanatory text before/after JSON
- Missing quotes around keys or values
- Response wrapped in markdown code blocks:
- If auto-repair consistently fails, modify the prompt in
vision.py:35to be more explicit:
Playwright installation issues
Playwright installation issues
Errors during Solution:
pip install or when running the script:-
Install Python dependencies:
-
Install Playwright browsers:
-
If you encounter permission errors:
- For system-specific issues, check Playwright documentation
Voice mode not working
Voice mode not working
When using Solution:
--voice flag, errors occur during audio capture:-
Install system audio dependencies:
macOS:
Ubuntu/Debian:Windows:
- Download ffmpeg from ffmpeg.org
- Add to system PATH
- Verify microphone permissions in system settings
-
Test whisper-mic independently:
main.py:17 with basic error catching.Bot clicking wrong elements or getting stuck
Bot clicking wrong elements or getting stuck
The agent repeatedly clicks the same element or selects incorrect targets.Debugging steps:
-
Check the objective clarity:
- Vague: “Find something interesting”
- Better: “Search for Python tutorials on Google”
-
Monitor the JSON responses printed to console (
main.py:36): -
Verify Vimium hint visibility:
- Run with visible browser: set
headless=Falseinvimbot.py:11 - Check if hints are obscured by page elements
- Run with visible browser: set
-
Increase wait time between actions (
main.py:30): -
Current limitation: No cycle detection exists (see Architecture page). The bot may loop if:
- Clicked element doesn’t change page state
- Navigation leads back to previous page
Future versions may implement graph-based retry mechanisms (see
README.md:48).
Timeout errors during navigation
Timeout errors during navigation
High API costs / token usage
High API costs / token usage
Running vimGPT consumes more tokens than expected.Optimization strategies:
-
Reduce image resolution (
vision.py:12):Trade-off: May reduce element detection accuracy. -
Limit max tokens per response (
vision.py:46): -
Monitor usage in OpenAI dashboard:
- Check token consumption per request
- Set billing alerts
-
Add task completion limits:
-
Future improvement: Use cheaper models for JSON cleanup instead of GPT-4o (
vision.py:54)
Debugging tips
Enable verbose logging
Add debug prints to track execution flow:Save screenshots for inspection
Capture what the model sees:Test with headful browser
Watch the automation in real-time:Validate JSON responses
Add schema validation:Getting help
If issues persist:- Check existing GitHub issues
- Search HackerNews discussion
- Review the source code for recent updates
- Open a new issue with:
- Full error message
- Steps to reproduce
- Python version and OS
- Screenshot samples (if relevant)