Overview
The scraping module provides web scraping functionality for extracting job posting data from LinkedIn. It implements the Strategy pattern to allow for flexible scraping implementations.ScraperStrategy
Abstract base class that defines the interface for all scraper implementations.Class Definition
Methods
extraer_datos()
Abstract method that must be implemented by all scraper strategies.The search term to query for job postings
Dictionary containing extraction results with the following structure:
exito(bool): Whether the extraction was successfultitulo_oferta(str): Job posting titleurl(str): URL of the job postinghabilidades_brutas(List[str]): Raw extracted skills/requirementsmensaje(str): Error message if extraction failed
LinkedInScraper
Concrete implementation ofScraperStrategy for extracting job data from LinkedIn using Selenium WebDriver.
Class Definition
Constructor
Methods
extraer_datos()
Navigates to LinkedIn, handles modals, and extracts job posting data.Search term for job postings. The word “linkedin” will be automatically removed if present.
Dictionary containing:
exito(bool): True if extraction succeededtitulo_oferta(str): Extracted job titleurl(str): Final URL of the job postinghabilidades_brutas(List[str]): Raw list of skills and requirements extracted from the job description
exito(bool): Falsemensaje(str): Error description
- Constructs LinkedIn search URL with encoded search term
- Initializes Chrome browser with anti-detection configuration
- Handles cookie consent and modal dismissals
- Navigates to the first job posting in search results
- Expands the full job description (“Show more” button)
- Extracts job title and description using BeautifulSoup
- Parses requirements from
<li>elements, paragraphs, or raw text - Returns raw extracted data without cleaning
_iniciar_navegador()
Private method that initializes the Selenium Chrome WebDriver with anti-detection settings.- Sets custom user agent to mimic real browser
- Disables automation detection features
- Maximizes browser window
- Uses ChromeDriverManager for automatic driver management
_cerrar_navegador()
Private method that safely closes the WebDriver instance.finally block of extraer_datos() to ensure cleanup.
Important Notes
The search URL is hardcoded to filter for jobs in México. Modify line 46 in
linkedin_scraper.py to change the location parameter.