Project Structure

This page documents the directory structure, module responsibilities, file organization, and naming conventions used in the LinkedIn Job Analyzer project.

Directory Layout

linkedin-job-analyzer/
├── exportadores/              # Export subsystem (Factory Pattern)
│   ├── base_exporter.py       # Abstract base class for exporters
│   ├── json_exporter.py       # JSON export implementation
│   ├── excel_exporter.py      # Excel export implementation
│   └── exporter_factory.py    # Factory for creating exporters
│
├── inteligencia_artificial/   # AI integration subsystem
│   └── gpt_analyzer.py        # OpenAI GPT integration
│
├── logica_negocio/            # Business logic layer
│   └── servicio_vacantes.py   # JobService facade
│
├── scraping/                  # Web scraping subsystem (Strategy Pattern)
│   ├── base_scraper.py        # Abstract scraper interface
│   └── linkedin_scraper.py    # LinkedIn-specific implementation
│
├── static/                    # Frontend assets
│   ├── script.js              # Client-side JavaScript
│   └── style.css              # Styles
│
├── templates/                 # HTML templates
│   └── index.html             # Main search interface
│
├── utilidades/                # Utility functions
│   └── text_cleaner.py        # Text processing utilities
│
├── datos_extraidos/           # Output directory (generated at runtime)
│   ├── linkedin_YYYYMMDD_HHMMSS.json
│   └── linkedin_YYYYMMDD_HHMMSS.xlsx
│
├── flask_app.py               # Application entry point
├── requirements.txt           # Python dependencies
├── .env.example               # Environment variable template
├── .gitignore                 # Git ignore rules
├── LICENSE                    # Project license
└── README.md                  # Project documentation

Module Responsibilities

1. exportadores/

Purpose: Handle data export to various file formats Pattern: Factory Pattern + Strategy Pattern Files:

base_exporter.py

Lines: 7 | Responsibility: Define exporter contract

class DataExporter(ABC):
    @abstractmethod
    def exportar(self, datos: Dict, ruta_salida: str) -> str:
        pass

json_exporter.py

Lines: 21 | Responsibility: Export data to JSON format Key Features:

UTF-8 encoding for international characters
Pretty printing with 4-space indentation
Returns file path for download links

excel_exporter.py

Lines: 45 | Responsibility: Export data to Excel format Key Features:

Multiple sheets (“Información General”, “Habilidades”)
Pandas DataFrame for structured data
openpyxl engine for .xlsx format

exporter_factory.py

Lines: 14 | Responsibility: Create appropriate exporter instances Supported Formats:

'json' → JSONExporter
'excel' → ExcelExporter
Extensible for CSV, PDF, Markdown, etc.

2. inteligencia_artificial/

Purpose: AI-powered job analysis using Large Language Models Files:

gpt_analyzer.py

Lines: 77 | Responsibility: OpenAI API integration Key Features:

Environment-based API key configuration
Structured prompt engineering
Error handling for API failures
Token optimization (limits to 30 skills)

Output Format:

**Objetivo del Rol**: ...
**Stack Tecnológico Principal**: ...
**Skills Blandas**: ...
**Nivel de Experiencia**: ...

3. logica_negocio/

Purpose: Business logic and workflow orchestration Pattern: Facade Pattern Files:

servicio_vacantes.py

Lines: 65 | Responsibility: Coordinate all subsystems Key Methods:

procesar_busqueda(termino_busqueda: str) -> Dict
    # Orchestrates: scraping → cleaning → packaging

generar_resumen_ia(titulo: str, habilidades: List[str]) -> str
    # Delegates to AIAnalyzer

guardar_datos(datos: Dict, formato: str) -> List[str]
    # Uses ExporterFactory to save files

Dependencies:

scraping.base_scraper.ScraperStrategy
utilidades.text_cleaner.TextCleaner
inteligencia_artificial.gpt_analyzer.AIAnalyzer
exportadores.exporter_factory.ExporterFactory

4. scraping/

Purpose: Extract job posting data from websites Pattern: Strategy Pattern Files:

base_scraper.py

Lines: 10 | Responsibility: Define scraper contract

class ScraperStrategy(ABC):
    @abstractmethod
    def extraer_datos(self, termino_busqueda: str) -> Dict:
        pass

linkedin_scraper.py

Lines: 148 | Responsibility: LinkedIn-specific web scraping Key Features:

Selenium WebDriver automation
User-agent spoofing for bot detection
Cookie banner handling
Modal dismissal (login prompts)
“Ver más” button expansion
BeautifulSoup HTML parsing
Graceful error handling

Technologies:

Selenium WebDriver
ChromeDriver (webdriver-manager)
BeautifulSoup4
Regular expressions

Return Format:

{
    'exito': True,
    'titulo_oferta': str,
    'url': str,
    'habilidades_brutas': List[str]  # Uncleaned text
}

5. static/

Purpose: Frontend assets served to the browser Files:

script.js

Responsibility: Client-side interaction logic Key Features:

Form submission via AJAX
Display extracted skills
Trigger AI analysis
Render markdown summaries
Download links for exports

style.css

Responsibility: Visual styling

6. templates/

Purpose: Jinja2 HTML templates for Flask Files:

index.html

Responsibility: Main user interface Components:

Search form for job title input
Results display area
AI analysis button
File download section

7. utilidades/

Purpose: Reusable utility functions Pattern: Static Utility Files:

text_cleaner.py

Lines: 45 | Responsibility: Text processing and cleaning Key Features:

Whitespace normalization
Special character filtering (preserves C#, C++)
Length validation (3-250 characters)
Noise word filtering
Case-insensitive deduplication

Excluded Words:

['click', 'show', 'more', 'less', 'see', 'view', 
 'apply', 'job', 'description', 'ver', 'más']

8. Root Files

flask_app.py

Lines: 42 | Responsibility: Web server and routing Routes:

GET / - Render search form
POST /buscar - Execute scraping
POST /analizar_ia - Request AI analysis
GET /descargar/<filename> - Download exports

Setup:

scraper = LinkedInScraper()
servicio = JobService(scraper=scraper)
app = Flask(__name__)

requirements.txt

Responsibility: Python package dependencies Key Packages:

flask
selenium
beautifulsoup4
webdriver-manager
pandas
openpyxl
openai
python-dotenv

.env.example

Responsibility: Environment variable template

OPENAI_API_KEY=sk-your-key-here

Naming Conventions

Python Files and Modules

Convention: snake_case Examples:

servicio_vacantes.py (business logic)
text_cleaner.py (utility)
linkedin_scraper.py (specific implementation)

Directory Names: Also snake_case

logica_negocio/
inteligencia_artificial/

Classes

Convention: PascalCase Examples:

JobService (facade)
ScraperStrategy (interface)
LinkedInScraper (implementation)
ExporterFactory (factory)
AIAnalyzer (service)

Pattern Suffixes:

*Strategy - Strategy pattern interfaces
*Factory - Factory pattern classes
*Exporter - Exporter implementations
*Service - Business logic facades

Methods

Convention: snake_case Examples:

procesar_busqueda() (business logic)
extraer_datos() (scraping)
limpiar_habilidades() (utility)
generar_resumen() (AI)

Private Methods: Prefix with _

_iniciar_navegador() (linkedin_scraper.py:21)
_cerrar_navegador() (linkedin_scraper.py:34)

Variables

Convention: snake_case Examples:

termino_busqueda (parameters)
habilidades_limpias (results)
datos_completos (data structures)
directorio_salida (configuration)

Constants

Convention: UPPER_SNAKE_CASE (implicit, few constants in this project) Example:

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

File Naming Patterns

Base Classes: base_*.py

base_scraper.py
base_exporter.py

Implementations: <platform>_<type>.py

linkedin_scraper.py
json_exporter.py
excel_exporter.py

Factories: *_factory.py

exporter_factory.py

Services: servicio_*.py

servicio_vacantes.py

File Organization Principles

1. Separation of Concerns

Each directory represents a distinct concern:

Scraping: Web automation and data extraction
Business Logic: Workflow orchestration
Export: Data persistence
AI: External API integration
Utilities: Stateless helpers

2. Layered Architecture

Presentation (flask_app.py, templates/, static/)
      ↓
Business Logic (logica_negocio/)
      ↓
Services (scraping/, inteligencia_artificial/, exportadores/)
      ↓
Utilities (utilidades/)

3. Dependency Direction

Dependencies flow downward:

flask_app.py → logica_negocio/
logica_negocio/ → scraping/, utilidades/, exportadores/, inteligencia_artificial/
Lower layers have NO dependencies on upper layers

4. Abstract Before Concrete

In each module:

base_*.py (abstract interface) first
Concrete implementations second
Factory (if applicable) last

How to Extend the System

Adding a New Scraper

1. Create concrete implementation:

# scraping/indeed_scraper.py
from .base_scraper import ScraperStrategy

class IndeedScraper(ScraperStrategy):
    def extraer_datos(self, termino_busqueda: str) -> Dict:
        # Implementation
        pass

2. Update application setup:

# flask_app.py
from scraping.indeed_scraper import IndeedScraper

scraper = IndeedScraper()  # Change this line
servicio = JobService(scraper=scraper)

Adding a New Export Format

1. Create exporter class:

# exportadores/csv_exporter.py
from .base_exporter import DataExporter
import csv

class CSVExporter(DataExporter):
    def exportar(self, datos: Dict, ruta_salida: str) -> str:
        # Implementation
        return ruta_salida

2. Update factory:

# exportadores/exporter_factory.py
from .csv_exporter import CSVExporter

class ExporterFactory:
    @staticmethod
    def obtener_exportador(formato: str):
        if formato == 'json':
            return JSONExporter()
        elif formato == 'excel':
            return ExcelExporter()
        elif formato == 'csv':  # Add this
            return CSVExporter()

Adding a New Utility Function

1. Add to existing utility class:

# utilidades/text_cleaner.py
class TextCleaner:
    @staticmethod
    def limpiar_habilidades(habilidades: List[str]) -> List[str]:
        # Existing method
        pass
    
    @staticmethod
    def extraer_emails(texto: str) -> List[str]:  # New method
        import re
        return re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', texto)

2. Or create new utility module:

# utilidades/date_formatter.py
from datetime import datetime

class DateFormatter:
    @staticmethod
    def formato_latino(fecha: datetime) -> str:
        return fecha.strftime("%d/%m/%Y %H:%M")

Adding a New Route

# flask_app.py
@app.route('/estadisticas', methods=['GET'])
def estadisticas():
    # Get statistics from stored files
    return jsonify({"total_busquedas": 42})

Code Organization Best Practices

1. One Class Per File

Exceptions:

exporter_factory.py imports multiple exporters but defines only the factory

2. Import Organization

# Standard library imports
import os
from datetime import datetime
from typing import Dict, List

# Third-party imports
from flask import Flask, request
import pandas as pd

# Local imports
from logica_negocio.servicio_vacantes import JobService
from scraping.linkedin_scraper import LinkedInScraper

3. File Size Guidelines

Small (under 50 lines): Interfaces, utilities, factories
Medium (50-150 lines): Services, exporters, scrapers
Large (over 150 lines): Only when necessary (linkedin_scraper.py handles complex workflow)

4. Comments in Spanish

Since the target audience is Spanish-speaking:

# Patrón Fachada: Orquesta el flujo entre componentes
class JobService:
    pass

Testing Structure (Future)

Recommended test organization:

tests/
├── test_scraping/
│   ├── test_linkedin_scraper.py
│   └── test_scraper_strategy.py
├── test_exportadores/
│   ├── test_json_exporter.py
│   ├── test_excel_exporter.py
│   └── test_exporter_factory.py
├── test_logica_negocio/
│   └── test_servicio_vacantes.py
├── test_utilidades/
│   └── test_text_cleaner.py
└── fixtures/
    ├── sample_html.html
    └── sample_job_data.json

Conclusion

The project structure follows these principles:

Modularity: Clear boundaries between subsystems
Naming Consistency: Spanish terms for domain concepts, English for technical patterns
Pattern Implementation: Directory structure reflects design patterns
Extensibility: Easy to add scrapers, exporters, and utilities
Maintainability: Small, focused files with single responsibilities

This structure supports the current monolithic architecture while allowing for future microservices extraction if needed.

System Design

​Project Structure

​Directory Layout

​Module Responsibilities

​1. exportadores/

​base_exporter.py

​json_exporter.py

​excel_exporter.py

​exporter_factory.py

​2. inteligencia_artificial/

​gpt_analyzer.py

​3. logica_negocio/

​servicio_vacantes.py

​4. scraping/

​base_scraper.py

​linkedin_scraper.py

​5. static/

​script.js

​style.css

​6. templates/

​index.html

​7. utilidades/

​text_cleaner.py

​8. Root Files

​flask_app.py

​requirements.txt

​.env.example

​Naming Conventions

​Python Files and Modules

​Classes

​Methods

​Variables

​Constants

​File Naming Patterns

​File Organization Principles

​1. Separation of Concerns

​2. Layered Architecture

​3. Dependency Direction

​4. Abstract Before Concrete

​How to Extend the System

​Adding a New Scraper

​Adding a New Export Format

​Adding a New Utility Function

​Adding a New Route

​Code Organization Best Practices

​1. One Class Per File

​2. Import Organization

​3. File Size Guidelines

​4. Comments in Spanish

​Testing Structure (Future)

​Conclusion

Build docs developers (and LLMs) love

Project Structure

Directory Layout

Module Responsibilities

1. exportadores/

base_exporter.py

json_exporter.py

excel_exporter.py

exporter_factory.py

2. inteligencia_artificial/

gpt_analyzer.py

3. logica_negocio/

servicio_vacantes.py

4. scraping/

base_scraper.py

linkedin_scraper.py

5. static/

script.js

style.css

6. templates/

index.html

7. utilidades/

text_cleaner.py

8. Root Files

flask_app.py

requirements.txt

.env.example

Naming Conventions

Python Files and Modules

Classes

Methods

Variables

Constants

File Naming Patterns

File Organization Principles

1. Separation of Concerns

2. Layered Architecture

3. Dependency Direction

4. Abstract Before Concrete

How to Extend the System

Adding a New Scraper

Adding a New Export Format

Adding a New Utility Function

Adding a New Route

Code Organization Best Practices

1. One Class Per File

2. Import Organization

3. File Size Guidelines

4. Comments in Spanish

Testing Structure (Future)

Conclusion