Troubleshooting

API key not found or authentication errors

vimGPT requires an OpenAI API key to function. The error typically appears as:

openai.AuthenticationError: No API key provided

Solution:

Create a .env file in the project root directory
Add your OpenAI API key:
```
OPENAI_API_KEY=sk-...
```
Ensure the .env file is in the same directory where you run main.py
Verify the key is valid by checking your OpenAI dashboard

The vision.py:11 module uses python-dotenv to automatically load environment variables from the .env file.

Vimium extension not loading

If you see errors about missing Vimium or the browser launches without yellow hint markers:

Error: Extension directory not found: ./vimium-master

Solution:

Run the setup script to download Vimium:
```
./setup.sh
```
Verify the vimium-master directory exists in your project root:
```
ls -la vimium-master/
```
If the directory exists but errors persist, check Playwright permissions:
```
chmod +x setup.sh
./setup.sh
```

Manual installation alternative:

curl -o vimium-master.zip -L https://github.com/philc/vimium/archive/refs/heads/master.zip
unzip vimium-master.zip
rm vimium-master.zip

The extension path is defined in vimbot.py:7 and must match the downloaded directory name.

GPT-4V not detecting elements / returning empty actions

The model fails to identify clickable elements and returns {} or incorrect actions.Common causes:

Image resolution too low for element detection
Vimium hints not visible in screenshot
Page not fully loaded before capture

Solutions:

Increase image resolution (edit vision.py:12):
```
IMG_RES = 1920  # Default is 1080
```
Note: Higher resolution increases token usage and API costs.

Add delay before capture (edit main.py:31):

time.sleep(2)  # Wait longer for page load
screenshot = driver.capture()

Verify Vimium activation by checking that yellow hints appear:
- Set headless=False in vimbot.py:11 to see the browser
- Manually press f to confirm Vimium works

Check screenshot quality by saving captures:

# Add to main.py after line 32
screenshot.save(f"debug_{time.time()}.png")

Invalid JSON response errors

GPT-4V returns text that cannot be parsed as JSON:

Error: Invalid JSON response

How it’s handled:vimGPT includes automatic JSON repair (vision.py:52). When the first parse fails:

The malformed response is sent to GPT-4o for cleaning
A second parse attempt is made
If both fail, an empty dict {} is returned

Manual debugging:

Check the printed JSON response in your terminal
Common issues:
- Response wrapped in markdown code blocks: ```json {...} ```
- Extra explanatory text before/after JSON
- Missing quotes around keys or values

If auto-repair consistently fails, modify the prompt in vision.py:35 to be more explicit:

"text": f"...You must respond ONLY with valid JSON. No markdown, no explanations, just pure JSON..."

Playwright installation issues

Errors during pip install or when running the script:

playwright._impl._api_types.Error: Executable doesn't exist

Solution:

Install Python dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install chromium
```

If you encounter permission errors:

playwright install --with-deps chromium

For system-specific issues, check Playwright documentation

Voice mode not working

When using --voice flag, errors occur during audio capture:

Error in capturing voice input: [Errno 2] No such file or directory: 'ffmpeg'

Solution:

Install system audio dependencies: macOS:
```
brew install portaudio ffmpeg
```
Ubuntu/Debian:
```
sudo apt-get install portaudio19-dev ffmpeg
```
Windows:
- Download ffmpeg from ffmpeg.org
- Add to system PATH
Verify microphone permissions in system settings

Test whisper-mic independently:

from whisper_mic import WhisperMic
mic = WhisperMic()
result = mic.listen()
print(result)

The voice input handling is in main.py:17 with basic error catching.

Bot clicking wrong elements or getting stuck

The agent repeatedly clicks the same element or selects incorrect targets.Debugging steps:

Check the objective clarity:
- Vague: “Find something interesting”
- Better: “Search for Python tutorials on Google”

Monitor the JSON responses printed to console (main.py:36):

JSON Response: {"click": "ab", "type": "python tutorials"}

Verify Vimium hint visibility:
- Run with visible browser: set headless=False in vimbot.py:11
- Check if hints are obscured by page elements

Increase wait time between actions (main.py:30):

time.sleep(3)  # Give page more time to update

Current limitation: No cycle detection exists (see Architecture page). The bot may loop if:
- Clicked element doesn’t change page state
- Navigation leads back to previous page Future versions may implement graph-based retry mechanisms (see README.md:48).

Timeout errors during navigation

Page navigation fails with:

playwright._impl._api_types.TimeoutError: Timeout 60000ms exceeded

Solution:

The default timeout is 60 seconds (vimbot.py:43)

Increase timeout for slow-loading pages:

def navigate(self, url):
    self.page.goto(
        url=url if "://" in url else "https://" + url,
        timeout=120000  # 2 minutes
    )

For pages that never finish loading (streaming content):
```
self.page.goto(url, wait_until="domcontentloaded")
```
Check if HTTPS errors are blocking (vimbot.py:22 already sets ignore_https_errors=True)

High API costs / token usage

Running vimGPT consumes more tokens than expected.Optimization strategies:

Reduce image resolution (vision.py:12):
```
IMG_RES = 720  # Lower resolution = fewer tokens
```
Trade-off: May reduce element detection accuracy.
Limit max tokens per response (vision.py:46):
```
max_tokens=50  # Reduced from 100
```
Monitor usage in OpenAI dashboard:
- Check token consumption per request
- Set billing alerts

Add task completion limits:

# In main.py
max_iterations = 20
for i in range(max_iterations):
    # existing loop code

Future improvement: Use cheaper models for JSON cleanup instead of GPT-4o (vision.py:54)

Get Started

Core Concepts

Usage

API Reference

Advanced

Troubleshooting

Common issues

Debugging tips

Enable verbose logging

Save screenshots for inspection

Test with headful browser

Validate JSON responses

Getting help

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

API Reference

Advanced

​Common issues

​Debugging tips

​Enable verbose logging

​Save screenshots for inspection

​Test with headful browser

​Validate JSON responses

​Getting help

Build docs developers (and LLMs) love

Common issues

Debugging tips

Enable verbose logging

Save screenshots for inspection

Test with headful browser

Validate JSON responses

Getting help