Skip to main content

Project Structure

This page documents the directory structure, module responsibilities, file organization, and naming conventions used in the LinkedIn Job Analyzer project.

Directory Layout

linkedin-job-analyzer/
├── exportadores/              # Export subsystem (Factory Pattern)
│   ├── base_exporter.py       # Abstract base class for exporters
│   ├── json_exporter.py       # JSON export implementation
│   ├── excel_exporter.py      # Excel export implementation
│   └── exporter_factory.py    # Factory for creating exporters

├── inteligencia_artificial/   # AI integration subsystem
│   └── gpt_analyzer.py        # OpenAI GPT integration

├── logica_negocio/            # Business logic layer
│   └── servicio_vacantes.py   # JobService facade

├── scraping/                  # Web scraping subsystem (Strategy Pattern)
│   ├── base_scraper.py        # Abstract scraper interface
│   └── linkedin_scraper.py    # LinkedIn-specific implementation

├── static/                    # Frontend assets
│   ├── script.js              # Client-side JavaScript
│   └── style.css              # Styles

├── templates/                 # HTML templates
│   └── index.html             # Main search interface

├── utilidades/                # Utility functions
│   └── text_cleaner.py        # Text processing utilities

├── datos_extraidos/           # Output directory (generated at runtime)
│   ├── linkedin_YYYYMMDD_HHMMSS.json
│   └── linkedin_YYYYMMDD_HHMMSS.xlsx

├── flask_app.py               # Application entry point
├── requirements.txt           # Python dependencies
├── .env.example               # Environment variable template
├── .gitignore                 # Git ignore rules
├── LICENSE                    # Project license
└── README.md                  # Project documentation

Module Responsibilities

1. exportadores/

Purpose: Handle data export to various file formats Pattern: Factory Pattern + Strategy Pattern Files:

base_exporter.py

Lines: 7 | Responsibility: Define exporter contract
class DataExporter(ABC):
    @abstractmethod
    def exportar(self, datos: Dict, ruta_salida: str) -> str:
        pass

json_exporter.py

Lines: 21 | Responsibility: Export data to JSON format Key Features:
  • UTF-8 encoding for international characters
  • Pretty printing with 4-space indentation
  • Returns file path for download links

excel_exporter.py

Lines: 45 | Responsibility: Export data to Excel format Key Features:
  • Multiple sheets (“Información General”, “Habilidades”)
  • Pandas DataFrame for structured data
  • openpyxl engine for .xlsx format

exporter_factory.py

Lines: 14 | Responsibility: Create appropriate exporter instances Supported Formats:
  • 'json' → JSONExporter
  • 'excel' → ExcelExporter
  • Extensible for CSV, PDF, Markdown, etc.

2. inteligencia_artificial/

Purpose: AI-powered job analysis using Large Language Models Files:

gpt_analyzer.py

Lines: 77 | Responsibility: OpenAI API integration Key Features:
  • Environment-based API key configuration
  • Structured prompt engineering
  • Error handling for API failures
  • Token optimization (limits to 30 skills)
Output Format:
1. **Objetivo del Rol**: ...
2. **Stack Tecnológico Principal**: ...
3. **Skills Blandas**: ...
4. **Nivel de Experiencia**: ...

3. logica_negocio/

Purpose: Business logic and workflow orchestration Pattern: Facade Pattern Files:

servicio_vacantes.py

Lines: 65 | Responsibility: Coordinate all subsystems Key Methods:
procesar_busqueda(termino_busqueda: str) -> Dict
    # Orchestrates: scraping → cleaning → packaging

generar_resumen_ia(titulo: str, habilidades: List[str]) -> str
    # Delegates to AIAnalyzer

guardar_datos(datos: Dict, formato: str) -> List[str]
    # Uses ExporterFactory to save files
Dependencies:
  • scraping.base_scraper.ScraperStrategy
  • utilidades.text_cleaner.TextCleaner
  • inteligencia_artificial.gpt_analyzer.AIAnalyzer
  • exportadores.exporter_factory.ExporterFactory

4. scraping/

Purpose: Extract job posting data from websites Pattern: Strategy Pattern Files:

base_scraper.py

Lines: 10 | Responsibility: Define scraper contract
class ScraperStrategy(ABC):
    @abstractmethod
    def extraer_datos(self, termino_busqueda: str) -> Dict:
        pass

linkedin_scraper.py

Lines: 148 | Responsibility: LinkedIn-specific web scraping Key Features:
  • Selenium WebDriver automation
  • User-agent spoofing for bot detection
  • Cookie banner handling
  • Modal dismissal (login prompts)
  • “Ver más” button expansion
  • BeautifulSoup HTML parsing
  • Graceful error handling
Technologies:
  • Selenium WebDriver
  • ChromeDriver (webdriver-manager)
  • BeautifulSoup4
  • Regular expressions
Return Format:
{
    'exito': True,
    'titulo_oferta': str,
    'url': str,
    'habilidades_brutas': List[str]  # Uncleaned text
}

5. static/

Purpose: Frontend assets served to the browser Files:

script.js

Responsibility: Client-side interaction logic Key Features:
  • Form submission via AJAX
  • Display extracted skills
  • Trigger AI analysis
  • Render markdown summaries
  • Download links for exports

style.css

Responsibility: Visual styling

6. templates/

Purpose: Jinja2 HTML templates for Flask Files:

index.html

Responsibility: Main user interface Components:
  • Search form for job title input
  • Results display area
  • AI analysis button
  • File download section

7. utilidades/

Purpose: Reusable utility functions Pattern: Static Utility Files:

text_cleaner.py

Lines: 45 | Responsibility: Text processing and cleaning Key Features:
  • Whitespace normalization
  • Special character filtering (preserves C#, C++)
  • Length validation (3-250 characters)
  • Noise word filtering
  • Case-insensitive deduplication
Excluded Words:
['click', 'show', 'more', 'less', 'see', 'view', 
 'apply', 'job', 'description', 'ver', 'más']

8. Root Files

flask_app.py

Lines: 42 | Responsibility: Web server and routing Routes:
  • GET / - Render search form
  • POST /buscar - Execute scraping
  • POST /analizar_ia - Request AI analysis
  • GET /descargar/<filename> - Download exports
Setup:
scraper = LinkedInScraper()
servicio = JobService(scraper=scraper)
app = Flask(__name__)

requirements.txt

Responsibility: Python package dependencies Key Packages:
flask
selenium
beautifulsoup4
webdriver-manager
pandas
openpyxl
openai
python-dotenv

.env.example

Responsibility: Environment variable template
OPENAI_API_KEY=sk-your-key-here

Naming Conventions

Python Files and Modules

Convention: snake_case Examples:
  • servicio_vacantes.py (business logic)
  • text_cleaner.py (utility)
  • linkedin_scraper.py (specific implementation)
Directory Names: Also snake_case
  • logica_negocio/
  • inteligencia_artificial/

Classes

Convention: PascalCase Examples:
  • JobService (facade)
  • ScraperStrategy (interface)
  • LinkedInScraper (implementation)
  • ExporterFactory (factory)
  • AIAnalyzer (service)
Pattern Suffixes:
  • *Strategy - Strategy pattern interfaces
  • *Factory - Factory pattern classes
  • *Exporter - Exporter implementations
  • *Service - Business logic facades

Methods

Convention: snake_case Examples:
  • procesar_busqueda() (business logic)
  • extraer_datos() (scraping)
  • limpiar_habilidades() (utility)
  • generar_resumen() (AI)
Private Methods: Prefix with _
  • _iniciar_navegador() (linkedin_scraper.py:21)
  • _cerrar_navegador() (linkedin_scraper.py:34)

Variables

Convention: snake_case Examples:
  • termino_busqueda (parameters)
  • habilidades_limpias (results)
  • datos_completos (data structures)
  • directorio_salida (configuration)

Constants

Convention: UPPER_SNAKE_CASE (implicit, few constants in this project) Example:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

File Naming Patterns

Base Classes: base_*.py
  • base_scraper.py
  • base_exporter.py
Implementations: <platform>_<type>.py
  • linkedin_scraper.py
  • json_exporter.py
  • excel_exporter.py
Factories: *_factory.py
  • exporter_factory.py
Services: servicio_*.py
  • servicio_vacantes.py

File Organization Principles

1. Separation of Concerns

Each directory represents a distinct concern:
  • Scraping: Web automation and data extraction
  • Business Logic: Workflow orchestration
  • Export: Data persistence
  • AI: External API integration
  • Utilities: Stateless helpers

2. Layered Architecture

Presentation (flask_app.py, templates/, static/)

Business Logic (logica_negocio/)

Services (scraping/, inteligencia_artificial/, exportadores/)

Utilities (utilidades/)

3. Dependency Direction

Dependencies flow downward:
  • flask_app.pylogica_negocio/
  • logica_negocio/scraping/, utilidades/, exportadores/, inteligencia_artificial/
  • Lower layers have NO dependencies on upper layers

4. Abstract Before Concrete

In each module:
  1. base_*.py (abstract interface) first
  2. Concrete implementations second
  3. Factory (if applicable) last

How to Extend the System

Adding a New Scraper

1. Create concrete implementation:
# scraping/indeed_scraper.py
from .base_scraper import ScraperStrategy

class IndeedScraper(ScraperStrategy):
    def extraer_datos(self, termino_busqueda: str) -> Dict:
        # Implementation
        pass
2. Update application setup:
# flask_app.py
from scraping.indeed_scraper import IndeedScraper

scraper = IndeedScraper()  # Change this line
servicio = JobService(scraper=scraper)

Adding a New Export Format

1. Create exporter class:
# exportadores/csv_exporter.py
from .base_exporter import DataExporter
import csv

class CSVExporter(DataExporter):
    def exportar(self, datos: Dict, ruta_salida: str) -> str:
        # Implementation
        return ruta_salida
2. Update factory:
# exportadores/exporter_factory.py
from .csv_exporter import CSVExporter

class ExporterFactory:
    @staticmethod
    def obtener_exportador(formato: str):
        if formato == 'json':
            return JSONExporter()
        elif formato == 'excel':
            return ExcelExporter()
        elif formato == 'csv':  # Add this
            return CSVExporter()

Adding a New Utility Function

1. Add to existing utility class:
# utilidades/text_cleaner.py
class TextCleaner:
    @staticmethod
    def limpiar_habilidades(habilidades: List[str]) -> List[str]:
        # Existing method
        pass
    
    @staticmethod
    def extraer_emails(texto: str) -> List[str]:  # New method
        import re
        return re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', texto)
2. Or create new utility module:
# utilidades/date_formatter.py
from datetime import datetime

class DateFormatter:
    @staticmethod
    def formato_latino(fecha: datetime) -> str:
        return fecha.strftime("%d/%m/%Y %H:%M")

Adding a New Route

# flask_app.py
@app.route('/estadisticas', methods=['GET'])
def estadisticas():
    # Get statistics from stored files
    return jsonify({"total_busquedas": 42})

Code Organization Best Practices

1. One Class Per File

Exceptions:
  • exporter_factory.py imports multiple exporters but defines only the factory

2. Import Organization

# Standard library imports
import os
from datetime import datetime
from typing import Dict, List

# Third-party imports
from flask import Flask, request
import pandas as pd

# Local imports
from logica_negocio.servicio_vacantes import JobService
from scraping.linkedin_scraper import LinkedInScraper

3. File Size Guidelines

  • Small (under 50 lines): Interfaces, utilities, factories
  • Medium (50-150 lines): Services, exporters, scrapers
  • Large (over 150 lines): Only when necessary (linkedin_scraper.py handles complex workflow)

4. Comments in Spanish

Since the target audience is Spanish-speaking:
# Patrón Fachada: Orquesta el flujo entre componentes
class JobService:
    pass

Testing Structure (Future)

Recommended test organization:
tests/
├── test_scraping/
│   ├── test_linkedin_scraper.py
│   └── test_scraper_strategy.py
├── test_exportadores/
│   ├── test_json_exporter.py
│   ├── test_excel_exporter.py
│   └── test_exporter_factory.py
├── test_logica_negocio/
│   └── test_servicio_vacantes.py
├── test_utilidades/
│   └── test_text_cleaner.py
└── fixtures/
    ├── sample_html.html
    └── sample_job_data.json

Conclusion

The project structure follows these principles:
  1. Modularity: Clear boundaries between subsystems
  2. Naming Consistency: Spanish terms for domain concepts, English for technical patterns
  3. Pattern Implementation: Directory structure reflects design patterns
  4. Extensibility: Easy to add scrapers, exporters, and utilities
  5. Maintainability: Small, focused files with single responsibilities
This structure supports the current monolithic architecture while allowing for future microservices extraction if needed.

Build docs developers (and LLMs) love