Skip to main content

Overview

The Vimbot class provides a high-level interface for autonomous web browsing using Playwright with the Vimium extension. It handles browser initialization, navigation, keyboard interactions, and screenshot capture for vision-based automation.

Class: Vimbot

Constructor

Vimbot(headless=False)
Initializes a new Vimbot instance with a Chromium browser context.
headless
bool
default:"False"
Whether to run the browser in headless mode. Set to True for running without a visible browser window.
Behavior:
  • Launches a persistent Chromium context with the Vimium extension loaded from ./vimium-master
  • Creates a new page with viewport size of 1080x720
  • Ignores HTTPS errors for flexibility in browsing
Example:
from vimbot import Vimbot

# Initialize with visible browser
driver = Vimbot()

# Initialize in headless mode
driver = Vimbot(headless=True)

Methods

navigate(url: str) -> None
Navigates to the specified URL.
url
str
required
The URL to navigate to. If the URL doesn’t contain ://, it will automatically prepend https://.
Example:
driver = Vimbot()
driver.navigate("https://www.google.com")
driver.navigate("github.com")  # Automatically becomes https://github.com

type()

type(text: str) -> None
Types the specified text and presses Enter.
text
str
required
The text to type into the active input field.
Behavior:
  • Waits 1 second before typing
  • Types the text character by character
  • Automatically presses Enter after typing
Example:
driver = Vimbot()
driver.navigate("https://www.google.com")
driver.click("a")  # Click on search box using Vimium hint
driver.type("autonomous web browsing")

click()

click(text: str) -> None
Simulates clicking on an element using Vimium keyboard shortcuts.
text
str
required
The Vimium hint characters (1-2 letter sequence from yellow boxes) to click on.
Example:
driver = Vimbot()
driver.navigate("https://www.google.com")
driver.click("ab")  # Clicks element with Vimium hint "ab"

capture()

capture() -> PIL.Image.Image
Captures a screenshot with Vimium hints visible on the screen.
screenshot
PIL.Image.Image
A PIL Image object in RGB format containing the screenshot with Vimium hints displayed.
Behavior:
  • Presses Escape to ensure clean state
  • Types “f” to activate Vimium’s hint mode (shows yellow boxes with letter sequences)
  • Takes and returns a screenshot as a PIL Image
Example:
driver = Vimbot()
driver.navigate("https://www.google.com")
screenshot = driver.capture()
screenshot.save("page_with_hints.png")

perform_action()

perform_action(action: dict) -> bool
Executes an action based on the provided action dictionary.
action
dict
required
A dictionary containing action keys and values. See Actions for detailed format.
done
bool
Returns True if the action contains a “done” key, indicating task completion. Otherwise returns None.
Supported action combinations:
  • {"done": True} - Signals completion
  • {"navigate": "url"} - Navigates to URL
  • {"type": "text"} - Types text
  • {"click": "ab"} - Clicks element
  • {"click": "ab", "type": "text"} - Clicks element then types text
Example:
import vision
from vimbot import Vimbot

driver = Vimbot()
driver.navigate("https://www.google.com")

while True:
    screenshot = driver.capture()
    action = vision.get_actions(screenshot, "search for Python tutorials")
    if driver.perform_action(action):
        break  # Task completed

Configuration

Vimium path

The Vimbot class expects the Vimium extension to be located at ./vimium-master relative to the working directory. Ensure you have downloaded and extracted the Vimium extension to this location.
vimium_path = "./vimium-master"

Browser settings

  • Viewport size: 1080x720 pixels
  • Browser: Chromium (via Playwright)
  • Extensions: Vimium for keyboard navigation
  • HTTPS errors: Ignored for flexibility
  • Timeout: 60 seconds for navigation

Build docs developers (and LLMs) love