Running vimGPT
To start vimGPT with a text-based objective:How it works
The vimGPT workflow operates in a continuous loop:- Navigation: The browser starts at Google (main.py:15)
- Screen capture: Takes a screenshot with Vimium overlays visible (vimbot.py:53-59)
- AI analysis: Sends the screenshot to GPT-4V with your objective (vision.py:25-46)
- Action execution: Performs the AI-suggested action (navigate, type, or click)
- Completion: Repeats until the AI determines the task is complete
The agent presses
Escape and then f to activate Vimium’s link hint mode, which overlays yellow character sequences on clickable elements (vimbot.py:55-56).Available actions
vimGPT supports three primary actions:Navigate
Navigates to a specified URL:https:// (vimbot.py:43).
Type
Types text and presses Enter:Click
Clicks on an element using its Vimium hint sequence:Combined actions
For input fields, the agent can click and type in sequence:Task completion
When the objective is satisfied, the AI returns:Example objectives
Here are some example objectives you can try:- “Search for Python tutorials on YouTube”
- “Find the latest news about artificial intelligence”
- “Go to GitHub and search for machine learning projects”
- “Find recipes for chocolate chip cookies”
- “Look up the weather forecast for San Francisco”
Understanding the output
As vimGPT runs, you’ll see console output showing its progress:- The screenshot capture phase (main.py:31-32)
- The AI’s action decision (main.py:35-36)
- The JSON response indicating what action to perform
Stopping execution
To stop vimGPT at any time, pressCtrl+C: