Learn how to scrape web content with Zendriver using text search and CSS selectors
Zendriver provides powerful methods for extracting data from web pages. This guide covers the fundamentals of finding and interacting with page elements.
When multiple elements contain your search text, use best_match=True to get the element with the most similar text length. This helps avoid matching script content or metadata.
# Find the login button, not script tags containing "login"login_button = await tab.find("login", best_match=True)await login_button.click()
Text search includes script contents and metadata. Using best_match=True is recommended for better accuracy.
Use find_all() and select_all() to retrieve multiple matching elements.
# Find all links on the pagelinks = await tab.select_all("a[href]")for link in links: url = link.get("href") text = link.text print(f"{text}: {url}")# Find all elements containing specific textprice_elements = await tab.find_all("$")for elem in price_elements: print(elem.text_all)
select_all() returns an empty list if no elements are found, rather than raising an exception.
Once you have an element, extract its data using properties and methods:
element = await tab.select("a.product-link")# Get attribute valueshref = element.get("href")class_name = element.get("class")data_id = element.get("data-id")# Get text contenttext = element.text # Direct text onlyall_text = element.text_all # Text including children# Get HTMLhtml = await element.get_html()# Get tag nametag = element.tag # Returns "a"
# Get full page sourcehtml_content = await tab.get_content()# Parse with BeautifulSoup or lxml if neededfrom bs4 import BeautifulSoupsoup = BeautifulSoup(html_content, 'html.parser')