Profile Analysis

Overview

Profile Analysis allows you to interrogate individual CVs using a RAG (Retrieval-Augmented Generation) chain. The system reads a PDF, creates vector embeddings, and answers questions about the candidate’s profile.

How It Works

Load PDF Document

Use LangChain’s PyPDFLoader to extract text from the CV:

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("path/to/cv.pdf")
docs = loader.load()

Create Vector Store

Convert the document into embeddings and store in FAISS:

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()

Define Prompt Template

Create a specialized prompt for career mentoring:

template = """
Eres un Mentor de Carrera Tecnológica y experto en empleabilidad joven.
Tu misión es analizar el perfil de este estudiante basándote SOLO en el siguiente contexto (su CV):
{context}

Pregunta: {question}
"""

Build RAG Chain

Combine retriever, prompt, and LLM:

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Query the CV

Ask questions about the candidate:

respuesta = chain.invoke("What are this student's main skills?")

Complete Implementation

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import os, random

# 1. SELECTION
carpeta_fuente = "cvs_estudiantes_final"

# Verify files exist
archivos_disponibles = os.listdir(carpeta_fuente)
if not archivos_disponibles:
    raise Exception("No CVs found. Make sure to run CV generation first.")

# Choose one randomly
archivo_elegido = random.choice(archivos_disponibles)
ruta_archivo = f"{carpeta_fuente}/{archivo_elegido}"

print(f"📂 Selected student profile: '{archivo_elegido}'")
print("⏳ Reading PDF and creating vectors...")

# 2. LOAD AND VECTORIZATION
loader = PyPDFLoader(ruta_archivo)
docs = loader.load()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()

# 3. PROMPT
template = """
Eres un Mentor de Carrera Tecnológica y experto en empleabilidad joven.
Tu misión es analizar el perfil de este estudiante basándote SOLO en el siguiente contexto (su CV):
{context}

Pregunta: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 4. EXECUTION
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 5. QUESTION (Focused on potential and skills, not years of experience)
pregunta = "¿Qué proyectos destacados o experiencia académica tiene este estudiante y cuál es su stack tecnológico principal?"
respuesta = chain.invoke(pregunta)

print(f"\n MENTOR QUESTION: {pregunta}")
print("-" * 50)
print(f"🤖 PROFILE ANALYSIS:\n{respuesta}")

RAG Chain Components

Retriever

Searches the vector store for relevant CV sections based on the question

Prompt

Instructs the LLM on how to analyze the retrieved context

LLM

Generates human-readable analysis from the context

Prompt Engineering

Effective Prompts for Student Analysis

The system uses a Career Mentor persona to focus on potential rather than experience:

Mentor Prompt
Technical Prompt
Skills Assessment

template = """
Eres un Mentor de Carrera Tecnológica y experto en empleabilidad joven.
Tu misión es analizar el perfil de este estudiante basándote SOLO en el siguiente contexto (su CV):
{context}

Pregunta: {question}
"""

This prompt:

Sets expertise context (tech career mentoring)
Focuses on young talent employability
Constrains responses to CV content only

template = """
You are a Technical Recruiter specializing in junior developers.
Analyze this student's CV and focus on:
- Academic projects and practical experience
- Technical skills and tools
- Learning potential and growth mindset

CV Content:
{context}

Question: {question}
"""

template = """
As a Skills Assessment Expert, evaluate this student's profile.
Rate their proficiency in:
- Programming languages
- Frameworks and tools
- Project complexity
- Team collaboration

Student CV:
{context}

Question: {question}
"""

Example Questions

Here are effective questions to ask about student profiles:

Projects & Experience

questions = [
    "¿Qué proyectos destacados tiene este estudiante?",
    "What academic projects has this candidate completed?",
    "Describe the most impressive achievement in this CV",
    "What practical experience does this student have?"
]

Technical Skills

questions = [
    "¿Cuál es su stack tecnológico principal?",
    "What programming languages does this candidate know?",
    "List all frameworks and tools mentioned",
    "What is their strongest technical area?"
]

Potential Assessment

questions = [
    "¿Por qué deberíamos contratar a este estudiante como practicante?",
    "What makes this candidate stand out?",
    "Assess this student's learning potential",
    "What role would be the best fit for this profile?"
]

Cultural Fit

questions = [
    "¿Qué tipo de equipo se ajustaría mejor a este perfil?",
    "Does this student show leadership potential?",
    "What are their collaboration skills based on the CV?",
    "Describe this candidate's work style"
]

Retriever Configuration

Customize how the system retrieves relevant information:

# Basic retriever
retriever = vectorstore.as_retriever()

# With custom parameters
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most relevant chunks
)

# MMR (Maximum Marginal Relevance) for diversity
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 4,
        "fetch_k": 10,  # Fetch 10, return 4 most diverse
        "lambda_mult": 0.5  # Balance relevance vs diversity
    }
)

# Similarity with score threshold
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "score_threshold": 0.8,  # Only high-confidence matches
        "k": 5
    }
)

Best Practice: For student CVs, use k=3 with similarity search. Student CVs are typically short (1-2 pages), so fewer chunks with higher relevance work better than diverse results.

Output Examples

Sample Analysis Output

Based on the CV, here's the profile analysis:

### Outstanding Projects and Academic Experience

Fernanda Paredes is a 9th-semester Software Engineering student (UTP) 
seeking her first professional opportunity as a Data Analyst Trainee.

**Experience and projects:**

1. **Academic Project as Data Analyst Trainee (Jun 2025 - Feb 2026):** 
   Focus on data analysis
2. **Outstanding Achievement (Hackathon):** 
   First place in university Hackathon for developing a recycling app. 
   Demonstrates ability to work under pressure, innovation, and practical 
   application of software development knowledge.

### Main Technology Stack

| Area | Technologies |
| :--- | :--- |
| **Data Analysis / BI** | Python, PowerBI |
| **Software Development** | Java, Spring Boot |

**Mentor's Conclusion:**

Fernanda has a solid foundation in development tools (Java, Spring Boot) 
and has shown proactivity in the data area (Python, PowerBI), which aligns 
with her goal of becoming a Data Analyst Trainee. Winning a Hackathon is 
a strong indicator of high potential and execution capability.

Error Handling

import os

carpeta_fuente = "cvs_estudiantes_final"

# Check if directory exists
if not os.path.exists(carpeta_fuente):
    raise FileNotFoundError(f"Directory '{carpeta_fuente}' not found. Run CV generation first.")

# Check if files exist
archivos_disponibles = [f for f in os.listdir(carpeta_fuente) if f.endswith('.pdf')]

if not archivos_disponibles:
    raise Exception("No PDF files found. Generate CVs before running analysis.")

print(f"Found {len(archivos_disponibles)} CVs")

Performance Tips

Cache Embeddings

Reuse the vector store for multiple questions:

# Create once
vectorstore = FAISS.from_documents(docs, embeddings)

# Query multiple times
for question in questions:
    answer = chain.invoke(question)

Batch Processing

Process multiple questions in parallel:

questions = ["Q1", "Q2", "Q3"]
answers = chain.batch(questions)

Get Started

Core Concepts

Guides

Overview

How It Works

Complete Implementation

RAG Chain Components

Retriever

Prompt

LLM

Prompt Engineering

Effective Prompts for Student Analysis

Example Questions

Retriever Configuration

Output Examples

Sample Analysis Output

Error Handling

Performance Tips

Cache Embeddings

Batch Processing

Next Steps

Talent Mining

Configuration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​How It Works

​Complete Implementation

​RAG Chain Components

Retriever

Prompt

LLM

​Prompt Engineering

​Effective Prompts for Student Analysis

​Example Questions

​Retriever Configuration

​Output Examples

​Sample Analysis Output

​Error Handling

​Performance Tips

Cache Embeddings

Batch Processing

​Next Steps

Talent Mining

Configuration

Build docs developers (and LLMs) love

Overview

How It Works

Complete Implementation

RAG Chain Components

Prompt Engineering

Effective Prompts for Student Analysis

Example Questions

Retriever Configuration

Output Examples

Sample Analysis Output

Error Handling

Performance Tips

Next Steps