Skip to main content

System Architecture Overview

The LinkedIn Job Analyzer is designed with a clean, modular architecture that follows SOLID principles and implements several design patterns to ensure maintainability, extensibility, and separation of concerns.

High-Level Architecture

The system follows a layered architecture with clear boundaries between components:
┌─────────────────────────────────────────────────────────────┐
│                     Presentation Layer                       │
│  (Flask Web Application + REST API + Static Frontend)       │
└─────────────────┬───────────────────────────────────────────┘

┌─────────────────▼───────────────────────────────────────────┐
│                     Business Logic Layer                     │
│              (JobService - Facade Pattern)                   │
└──┬────────────┬─────────────┬───────────────┬───────────────┘
   │            │             │               │
┌──▼────────┐ ┌─▼────────┐ ┌─▼──────────┐ ┌─▼────────────┐
│  Scraping  │ │Utilities │ │    AI      │ │  Exporters   │
│   Layer    │ │  Layer   │ │   Layer    │ │    Layer     │
│ (Strategy) │ │ (Static) │ │(OpenAI API)│ │  (Factory)   │
└────────────┘ └──────────┘ └────────────┘ └──────────────┘

Core Components

1. Presentation Layer

Location: flask_app.py, templates/, static/ Responsibility: Handle HTTP requests/responses and user interaction Key Routes:
  • GET / - Render the search form
  • POST /buscar - Execute job scraping and data extraction
  • POST /analizar_ia - Request AI-powered job analysis
  • GET /descargar/<filename> - Download generated export files
Example (flask_app.py:17-27):
@app.route('/buscar', methods=['POST'])
def buscar():
    termino = request.form.get('puesto')
    resultado = servicio.procesar_busqueda(termino)
    
    if resultado['exito']:
        rutas = servicio.guardar_datos(resultado['datos_completos'], '3')
        resultado['rutas_archivos'] = rutas
        return jsonify(resultado)
    return jsonify({"error": resultado['mensaje']}), 400

2. Business Logic Layer

Location: logica_negocio/servicio_vacantes.py Class: JobService (Facade Pattern) Responsibility: Orchestrate the workflow between scraping, cleaning, analysis, and export Key Methods:
  • procesar_busqueda(termino_busqueda) - Main workflow coordinator
  • generar_resumen_ia(titulo, habilidades) - Delegate AI analysis
  • guardar_datos(datos, formato) - Persist results to files
Data Flow (servicio_vacantes.py:22-45):
def procesar_busqueda(self, termino_busqueda: str) -> Dict:
    # 1. Scrape raw data
    resultado = self.scraper.extraer_datos(termino_busqueda)
    if not resultado['exito']:
        return resultado
    
    # 2. Clean extracted text
    habilidades_limpias = TextCleaner.limpiar_habilidades(
        resultado['habilidades_brutas']
    )
    
    # 3. Package complete data
    datos_completos = {
        'termino_busqueda': termino_busqueda,
        'titulo_oferta': resultado['titulo_oferta'],
        'url': resultado['url'],
        'habilidades': habilidades_limpias,
        'fecha_extraccion': datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    }
    
    return {'exito': True, 'datos_completos': datos_completos}

3. Scraping Layer

Location: scraping/ Pattern: Strategy Pattern Components:
  • ScraperStrategy (Abstract Base Class) - Defines contract
  • LinkedInScraper - Concrete implementation using Selenium
Responsibility: Extract job posting data from LinkedIn

4. Utilities Layer

Location: utilidades/ Class: TextCleaner (Static Utility) Responsibility: Clean, filter, and deduplicate extracted text

5. AI Analysis Layer

Location: inteligencia_artificial/ Class: AIAnalyzer Responsibility: Generate structured summaries using OpenAI GPT models

6. Export Layer

Location: exportadores/ Pattern: Factory Pattern + Strategy Pattern Components:
  • DataExporter - Abstract base class
  • JSONExporter, ExcelExporter - Concrete implementations
  • ExporterFactory - Factory for creating exporters

Request/Response Cycle

Here’s the complete flow when a user searches for a job:
1. User submits search term via web form

2. Flask route /buscar receives POST request

3. JobService.procesar_busqueda() is called

4. LinkedInScraper.extraer_datos() scrapes LinkedIn
   ├── Opens Chrome with Selenium
   ├── Navigates to LinkedIn job search
   ├── Handles cookie banners and modals
   ├── Clicks first job listing
   ├── Expands full job description
   └── Parses HTML with BeautifulSoup

5. TextCleaner.limpiar_habilidades() processes raw text
   ├── Removes duplicates
   ├── Filters noise words
   └── Validates quality

6. JobService packages complete data

7. JobService.guardar_datos() persists results
   ├── ExporterFactory creates JSONExporter
   ├── JSONExporter.exportar() writes .json file
   ├── ExporterFactory creates ExcelExporter
   └── ExcelExporter.exportar() writes .xlsx file

8. Flask returns JSON response with file paths

9. Frontend displays results and download links

Optional AI Analysis Flow

1. User clicks "Analyze with AI" button

2. Frontend sends POST to /analizar_ia with job data

3. JobService.generar_resumen_ia() is called

4. AIAnalyzer.generar_resumen() sends prompt to OpenAI
   ├── Constructs structured prompt
   ├── Calls GPT-3.5-turbo API
   └── Returns formatted markdown summary

5. Flask returns AI-generated summary

6. Frontend renders markdown summary

Separation of Concerns

The architecture maintains clear boundaries:
LayerConcernsDependencies
PresentationHTTP, routing, templatesBusiness Logic
Business LogicWorkflow orchestrationAll layers
ScrapingWeb automation, HTML parsingNone (external: Selenium, BeautifulSoup)
UtilitiesText processingNone
AIOpenAI integrationNone (external: OpenAI API)
ExportFile I/O, format conversionNone (external: json, pandas)
Benefits:
  • Each component has a single responsibility
  • Easy to test components in isolation
  • Can swap implementations without affecting other layers
  • New features can be added without modifying existing code

Dependency Injection

The system uses constructor injection to provide flexibility: Example (flask_app.py:9-10):
scraper = LinkedInScraper()
servicio = JobService(scraper=scraper)
This allows:
  • Easy mocking for tests
  • Runtime strategy selection
  • Future support for additional scrapers (Indeed, Glassdoor, etc.)

Error Handling Strategy

The system uses result objects instead of exceptions for business logic:
# Success case
{
    'exito': True,
    'titulo_oferta': '...',
    'habilidades': [...],
    'datos_completos': {...}
}

# Failure case
{
    'exito': False,
    'mensaje': 'Error al extraer: ...'
}
This approach:
  • Makes error handling explicit
  • Provides user-friendly error messages
  • Allows graceful degradation

Scalability Considerations

While the current architecture is monolithic, it’s designed for future growth: Current State: Single-process Flask application Future Enhancements:
  • Add task queue (Celery/RQ) for background scraping
  • Implement caching layer (Redis) for frequently searched jobs
  • Extract scrapers into separate microservices
  • Add database layer for persistent storage
  • Implement rate limiting and request throttling

Technology Stack

ComponentTechnologies
Web FrameworkFlask
ScrapingSelenium WebDriver, BeautifulSoup4
AI IntegrationOpenAI Python SDK
Data ExportPandas, openpyxl, json
FrontendHTML, CSS, JavaScript
Browser AutomationChrome/ChromeDriver

Configuration Management

The system uses environment variables for sensitive configuration:
# .env file
OPENAI_API_KEY=sk-...
Loaded via: dotenv library (inteligencia_artificial/gpt_analyzer.py:7) This ensures:
  • Secrets are not committed to version control
  • Easy configuration across environments
  • Security best practices

Build docs developers (and LLMs) love