System requirements
Before installing vimGPT, ensure your system meets these requirements:- Operating system: Linux, macOS, or Windows
- Python: Version 3.8 or higher
- Browser: Chrome/Chromium (automatically installed by Playwright)
- API access: OpenAI API key with GPT-4 with Vision enabled
- Optional: Microphone access for voice mode
Installation steps
Install Python dependencies
Install all required packages using pip:This installs the following key dependencies:
openai==1.1.2- OpenAI API client for GPT-4Vplaywright==1.39.0- Browser automation frameworkPillow==10.1.0- Image processing for screenshotspython-dotenv==1.0.0- Environment variable managementwhisper-mic- Voice input support (optional)
After installing Playwright, you need to install browser binaries:
Download Vimium extension
vimGPT requires the Vimium Chrome extension to be loaded locally. Run the provided setup script:Or manually execute these commands:This downloads Vimium to
./vimium-master/, which vimGPT loads when launching the browser.Configure OpenAI API key
Create a Add your OpenAI API key:The API key is loaded in vision.py:
.env file in the project root:.env
Detailed dependency breakdown
Core dependencies
Here’s what each major dependency does in vimGPT:- openai: Communicates with GPT-4 with Vision API to analyze screenshots
- playwright: Automates Chromium browser with Vimium extension loaded
- Pillow: Processes and resizes screenshots before sending to GPT-4V
- python-dotenv: Loads OpenAI API key from
.envfile - whisper-mic: Enables voice input mode (optional)
Browser automation (vimbot.py)
The Vimbot class initializes Playwright with Vimium:Vision processing (vision.py)
Screenshots are resized before being sent to the API:Configuration options
Headless mode
By default, vimGPT opens a visible browser window. To run in headless mode, modify the Vimbot initialization:Viewport dimensions
Adjust browser size in vimbot.py:27:Image resolution
Modify the resolution constant in vision.py:12:Model selection
The default model isgpt-4o (vision.py:28). You can change this to other vision-capable models:
Voice mode setup
To use voice input, ensure thewhisper-mic package is installed and your microphone is accessible:
Troubleshooting
playwright install fails
playwright install fails
If Playwright browser installation fails:
Vimium not loading in browser
Vimium not loading in browser
Ensure the extension was downloaded correctly:You should see files like
background.js, manifest.json, etc. If the directory is empty, re-run:OpenAI API authentication fails
OpenAI API authentication fails
Verify your API key is correctly set:If this prints
None, check:.envfile exists in project root- File contains
OPENAI_API_KEY=sk-... - No extra spaces or quotes around the key
Voice mode not working
Voice mode not working
Common voice mode issues:Microphone not detected:Whisper model download fails:
Whisper models are downloaded on first use. Ensure you have:
- Internet connectivity
- Sufficient disk space (~1GB for base model)
- Write permissions in cache directory
Module import errors
Module import errors
If you see For development, use a virtual environment:
ModuleNotFoundError, ensure all dependencies installed:API costs and usage
Typical costs per browsing task:- Simple task (3-5 actions): 0.15
- Medium task (10-15 actions): 0.40
- Complex task (20+ actions): $0.50+
Environment variables reference
All available environment variables:.env
Next steps
Now that vimGPT is installed:- Complete the quickstart guide to run your first task
- Experiment with different objectives and complexity levels
- Monitor your API usage and costs
- Try voice mode for hands-free browsing
For issues or questions, visit the GitHub repository or check existing issues and discussions.