Skip to main content
This guide covers everything you need to install and configure LinkedIn Job Analyzer on your system.

Prerequisites

Required Software

The application is built with Python 3.13 but supports Python 3.10+.Check your Python version:
python --version
Installation:
  • macOS: brew install python
  • Ubuntu/Debian: sudo apt install python3 python3-pip
  • Windows: Download from python.org
Required for Selenium WebDriver to scrape LinkedIn.Why Chrome? The project uses webdriver-manager which automatically handles ChromeDriver installation and updates.Installation: Download from google.com/chrome
The WebDriver Manager will automatically download the matching ChromeDriver version.
Required for AI-powered job analysis features.Get your API key:
  1. Sign up at platform.openai.com
  2. Navigate to API Keys
  3. Click “Create new secret key”
  4. Copy and save your key securely
API usage incurs costs. The app uses GPT-3.5/4 only when you click “Generate Strategic Summary”.
Recommended for cloning the repository.Check if installed:
git --version
Installation:
  • macOS: brew install git
  • Ubuntu/Debian: sudo apt install git
  • Windows: Download from git-scm.com

Installation Steps

1

Get the Source Code

Clone the repository or download the source:
git clone https://github.com/JuseAR27/Web_Scraping.git
cd Web_Scraping
2

Create a Virtual Environment (Recommended)

Isolate project dependencies from your system Python:
python3 -m venv venv
source venv/bin/activate
Your terminal prompt should change to show (venv) when the environment is activated.
3

Install Python Dependencies

Install all required packages from requirements.txt:
pip install -r requirements.txt
This installs the following dependencies:
PackageVersionPurpose
flaskLatestWeb framework and routing
seleniumLatestBrowser automation for scraping
webdriver-managerLatestAutomatic ChromeDriver management
beautifulsoup4LatestHTML parsing and data extraction
pandasLatestData manipulation
openpyxlLatestExcel file generation
openaiLatestGPT API integration
python-dotenvLatestEnvironment variable management
Installation typically takes 1-2 minutes depending on your internet connection.
4

Configure Environment Variables

Create a .env file in the project root directory:
touch .env
Add your OpenAI API key:
.env
OPENAI_API_KEY=sk-proj-...
Security Best Practices:
  • Never commit .env to version control
  • Keep your API key confidential
  • Rotate keys if accidentally exposed
  • The .env file should already be in .gitignore
5

Verify Directory Structure

Ensure your project has the following structure:
Web_Scraping/
├── flask_app.py              # Main Flask application
├── requirements.txt          # Python dependencies
├── .env                      # Your API keys (not in git)
├── .env.example             # Template for .env
├── scraping/                # Scraping logic (Strategy pattern)
│   ├── base_scraper.py
│   └── linkedin_scraper.py
├── logica_negocio/          # Business logic (Facade pattern)
│   └── servicio_vacantes.py
├── inteligencia_artificial/ # AI integration
│   └── gpt_analyzer.py
├── exportadores/            # Data export (Factory pattern)
│   ├── base_exporter.py
│   ├── excel_exporter.py
│   ├── json_exporter.py
│   └── exporter_factory.py
├── utilidades/              # Utility functions
│   └── text_cleaner.py
├── static/                  # CSS and JavaScript
└── templates/               # HTML templates
    └── index.html
6

Create Output Directory

The application will auto-create this, but you can create it manually:
mkdir -p datos_extraidos
This directory will store all exported JSON and Excel files.

Verification

Verify your installation is complete:
1

Test Python Imports

python -c "import flask, selenium, bs4, pandas, openai; print('All imports successful!')"
Expected output: All imports successful!
2

Check Environment Variables

python -c "from dotenv import load_dotenv; import os; load_dotenv(); print('API Key loaded:', 'Yes' if os.getenv('OPENAI_API_KEY') else 'No')"
Expected output: API Key loaded: Yes
3

Start the Application

python flask_app.py
Expected output:
* Serving Flask app 'flask_app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
4

Access the Web Interface

Open your browser and navigate to:
http://127.0.0.1:5000
You should see the LinkedIn Job Analyzer interface.

Troubleshooting

Cause: Dependencies not installed or virtual environment not activated.Solution:
# Activate virtual environment first
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows

# Then reinstall dependencies
pip install -r requirements.txt
Cause: Chrome or ChromeDriver compatibility issues.Solutions:
  1. Update Google Chrome to the latest version
  2. Clear webdriver cache:
    rm -rf ~/.wdm/  # macOS/Linux
    rmdir /s %USERPROFILE%\.wdm  # Windows
    
  3. Reinstall webdriver-manager:
    pip install --upgrade webdriver-manager
    
Cause: Invalid or missing API key.Solution:
  1. Verify your .env file exists in the project root
  2. Check the key format starts with sk-
  3. Ensure no extra spaces or quotes:
    # Correct
    OPENAI_API_KEY=sk-proj-abc123
    
    # Incorrect
    OPENAI_API_KEY="sk-proj-abc123"  # No quotes!
    OPENAI_API_KEY = sk-proj-abc123  # No spaces!
    
  4. Test your key:
    python -c "from openai import OpenAI; import os; from dotenv import load_dotenv; load_dotenv(); client = OpenAI(api_key=os.getenv('OPENAI_API_KEY')); print('API key valid!')"
    
Cause: Another application is using port 5000.Solution 1 - Use a different port:Edit flask_app.py (line 42):
if __name__ == '__main__':
    app.run(debug=True, port=5001)  # Change port
Solution 2 - Kill the process:
# Find the process
lsof -ti:5000  # macOS/Linux

# Kill it
kill -9 $(lsof -ti:5000)
Cause: Insufficient file system permissions.Solution:
# Create directory with proper permissions
mkdir -p datos_extraidos
chmod 755 datos_extraidos
Cause: Too many requests or bot detection.Solutions:
  1. Wait a few minutes between scraping attempts
  2. Ensure you’re scraping public job listings only
  3. Consider using LinkedIn’s official API for production use
This tool is for educational purposes. Always respect LinkedIn’s Terms of Service.

Configuration Options

Flask Application Settings

Edit flask_app.py to customize:
flask_app.py
if __name__ == '__main__':
    app.run(
        debug=True,        # Set to False in production
        host='127.0.0.1',  # Change to '0.0.0.0' for network access
        port=5000          # Change if port conflicts
    )

Output Directory

The default output directory is datos_extraidos/. To change it, modify logica_negocio/servicio_vacantes.py:17:
self.directorio_salida = "datos_extraidos"  # Change this path

Export Format

In flask_app.py:24, the format parameter controls export type:
rutas = servicio.guardar_datos(resultado['datos_completos'], '3')
# '1' = JSON only
# '2' = Excel only  
# '3' = Both formats

Updating

To update the application:
1

Pull Latest Changes

git pull origin main
2

Update Dependencies

pip install --upgrade -r requirements.txt
3

Restart the Server

Press CTRL+C to stop the current server, then:
python flask_app.py

Next Steps

Quick Start Guide

Run your first job analysis

Architecture Overview

Learn about design patterns and code structure

Build docs developers (and LLMs) love