System Architecture Overview
The LinkedIn Job Analyzer is designed with a clean, modular architecture that follows SOLID principles and implements several design patterns to ensure maintainability, extensibility, and separation of concerns.High-Level Architecture
The system follows a layered architecture with clear boundaries between components:Core Components
1. Presentation Layer
Location:flask_app.py, templates/, static/
Responsibility: Handle HTTP requests/responses and user interaction
Key Routes:
GET /- Render the search formPOST /buscar- Execute job scraping and data extractionPOST /analizar_ia- Request AI-powered job analysisGET /descargar/<filename>- Download generated export files
2. Business Logic Layer
Location:logica_negocio/servicio_vacantes.py
Class: JobService (Facade Pattern)
Responsibility: Orchestrate the workflow between scraping, cleaning, analysis, and export
Key Methods:
procesar_busqueda(termino_busqueda)- Main workflow coordinatorgenerar_resumen_ia(titulo, habilidades)- Delegate AI analysisguardar_datos(datos, formato)- Persist results to files
3. Scraping Layer
Location:scraping/
Pattern: Strategy Pattern
Components:
ScraperStrategy(Abstract Base Class) - Defines contractLinkedInScraper- Concrete implementation using Selenium
4. Utilities Layer
Location:utilidades/
Class: TextCleaner (Static Utility)
Responsibility: Clean, filter, and deduplicate extracted text
5. AI Analysis Layer
Location:inteligencia_artificial/
Class: AIAnalyzer
Responsibility: Generate structured summaries using OpenAI GPT models
6. Export Layer
Location:exportadores/
Pattern: Factory Pattern + Strategy Pattern
Components:
DataExporter- Abstract base classJSONExporter,ExcelExporter- Concrete implementationsExporterFactory- Factory for creating exporters
Request/Response Cycle
Here’s the complete flow when a user searches for a job:Optional AI Analysis Flow
Separation of Concerns
The architecture maintains clear boundaries:| Layer | Concerns | Dependencies |
|---|---|---|
| Presentation | HTTP, routing, templates | Business Logic |
| Business Logic | Workflow orchestration | All layers |
| Scraping | Web automation, HTML parsing | None (external: Selenium, BeautifulSoup) |
| Utilities | Text processing | None |
| AI | OpenAI integration | None (external: OpenAI API) |
| Export | File I/O, format conversion | None (external: json, pandas) |
- Each component has a single responsibility
- Easy to test components in isolation
- Can swap implementations without affecting other layers
- New features can be added without modifying existing code
Dependency Injection
The system uses constructor injection to provide flexibility: Example (flask_app.py:9-10):- Easy mocking for tests
- Runtime strategy selection
- Future support for additional scrapers (Indeed, Glassdoor, etc.)
Error Handling Strategy
The system uses result objects instead of exceptions for business logic:- Makes error handling explicit
- Provides user-friendly error messages
- Allows graceful degradation
Scalability Considerations
While the current architecture is monolithic, it’s designed for future growth: Current State: Single-process Flask application Future Enhancements:- Add task queue (Celery/RQ) for background scraping
- Implement caching layer (Redis) for frequently searched jobs
- Extract scrapers into separate microservices
- Add database layer for persistent storage
- Implement rate limiting and request throttling
Technology Stack
| Component | Technologies |
|---|---|
| Web Framework | Flask |
| Scraping | Selenium WebDriver, BeautifulSoup4 |
| AI Integration | OpenAI Python SDK |
| Data Export | Pandas, openpyxl, json |
| Frontend | HTML, CSS, JavaScript |
| Browser Automation | Chrome/ChromeDriver |
Configuration Management
The system uses environment variables for sensitive configuration:dotenv library (inteligencia_artificial/gpt_analyzer.py:7)
This ensures:
- Secrets are not committed to version control
- Easy configuration across environments
- Security best practices