Skip to main content

Overview

The PerfilEstudiante (Student Profile) model is a Pydantic BaseModel that defines the schema for extracting and validating structured data from student resumes. This model is designed specifically for student and early-career talent, emphasizing academic achievements, projects, and technical potential over traditional work experience.
This model is optimized for student recruitment scenarios where academic projects, hackathon achievements, and technical skills are more valuable indicators than years of experience.

Model Definition

The model is defined in the notebook at source/notebook/Talent_Scout_3000x.ipynb:902-919.
from pydantic import BaseModel, Field

class PerfilEstudiante(BaseModel):
    # Datos Personales
    nombre: str = Field(description="Nombre completo del estudiante")
    email: str = Field(description="Email universitario o personal")
    ubicacion: str = Field(description="Ciudad/País")

    # Perfil Académico
    universidad: str = Field(description="Nombre de la universidad o instituto")
    carrera: str = Field(description="Carrera que está estudiando (ej. Ing. Software)")
    ciclo_actual: str = Field(description="Ciclo o semestre actual (ej. 7mo Ciclo, Egresado)")

    # Talento Tech
    stack_principal: list = Field(description="Lista de top 5 lenguajes/tecnologías que domina")
    proyectos_destacados: list = Field(description="Nombres de proyectos académicos, tesis o freelance mencionados")

    # Evaluación de Perfil
    tipo_perfil: str = Field(description="Clasificar en: Backend, Frontend, Data, Fullstack o Gestión")
    potencial_contratacion: str = Field(description="Breve justificación de por qué contratarlo como practicante")

Field Reference

Personal Information

nombre
string
required
Full name of the student as it appears on the CV.Example: "Fernanda Paredes"
email
string
required
University or personal email address. Typically includes university domain for students.Example: "[email protected]"
ubicacion
string
required
City and country location of the candidate.Example: "Lima, Perú"

Academic Profile

universidad
string
required
Name of the university or technical institute where the student is enrolled.Valid values: "UTP", "UPC", "UNI", "San Marcos", "U. Lima", "Senati", "Cibertec"Example: "UTP"
carrera
string
required
Academic major or degree program the student is pursuing.Example: "Ingeniería de Software", "Ciencias de la Computación"
ciclo_actual
string
required
Current semester or cycle in the degree program. Can also indicate “Egresado” (graduated) status.Format: "7mo Ciclo", "VI Ciclo", "Egresado"Example: "9no Ciclo"

Technical Talent

stack_principal
list[string]
required
List of top 5 programming languages, frameworks, or technologies the student has proficiency in. Extracted from projects and experience sections.Example:
["Python", "PowerBI", "Java", "Spring Boot", "React"]
proyectos_destacados
list[string]
required
Names of notable academic projects, thesis work, freelance projects, or hackathon achievements. Focus on what was built, not company names.Example:
[
  "Sistema de Biblioteca Virtual con roles de usuario",
  "Primer puesto en Hackathon desarrollando app de reciclaje",
  "API RESTful para gestión financiera usando FastAPI"
]

Profile Evaluation

tipo_perfil
string
required
Classification of the student’s technical specialization based on their skill stack.Valid values:
  • "Backend" — Java + Spring, Python APIs, database focus
  • "Frontend" — React, Vue, Angular, UI/UX tools
  • "Data" — Python + Pandas, PowerBI, SQL analytics
  • "Fullstack" — Both frontend and backend technologies
  • "Gestión" — Administrative, business, or non-technical roles
Classification logic:
  • Python + Pandas/PowerBI → "Data"
  • React + Node.js → "Fullstack"
  • Java + Spring Boot → "Backend"
Example: "Data"
potencial_contratacion
string
required
A brief justification (1-2 sentences) explaining why this student is a strong candidate for an internship or junior role. Should highlight potential over experience.Example:
"Fernanda es una candidata fuerte para Data Analyst Trainee. 
Su victoria en Hackathon demuestra capacidad de ejecución bajo presión, 
y su stack (Python, PowerBI) es coherente con análisis de datos."

Validation Rules

All fields are required and will raise validation errors if missing or invalid.
  • String fields cannot be empty
  • List fields must contain at least one element
  • tipo_perfil should match one of the five predefined categories
  • ciclo_actual should follow Spanish ordinal format (e.g., “7mo Ciclo”)

Usage Example

Basic Instantiation

from pydantic import BaseModel, Field

# Create a student profile instance
perfil = PerfilEstudiante(
    nombre="Fernanda Paredes",
    email="[email protected]",
    ubicacion="Lima, Perú",
    universidad="UTP",
    carrera="Ingeniería de Software",
    ciclo_actual="9no Ciclo",
    stack_principal=["Python", "PowerBI", "Java", "Spring Boot"],
    proyectos_destacados=[
        "Primer puesto en Hackathon universitaria - App de Reciclaje",
        "Sistema de análisis de datos con Python y Pandas"
    ],
    tipo_perfil="Data",
    potencial_contratacion="Estudiante avanzado con experiencia práctica en análisis de datos y victoria en Hackathon."
)

print(perfil.nombre)  # "Fernanda Paredes"
print(perfil.tipo_perfil)  # "Data"

With JSON Output Parser

The model is typically used with LangChain’s JsonOutputParser for structured extraction from CV text:
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=PerfilEstudiante)

# Use in a LangChain chain
chain_extract = prompt_extract | llm | parser

data = chain_extract.invoke({
    "context": cv_text,
    "format_instructions": parser.get_format_instructions()
})

# data is now a dict matching PerfilEstudiante schema
print(data['nombre'])  # Extracted name
print(data['stack_principal'])  # Extracted tech stack

Batch Processing

Process multiple CVs and convert to DataFrame:
import pandas as pd
import glob
from langchain_community.document_loaders import PyPDFLoader

resultados = []
archivos = glob.glob("cvs_estudiantes_final/*.pdf")

for pdf in archivos:
    loader = PyPDFLoader(pdf)
    pages = loader.load()
    texto_completo = "\n".join([p.page_content for p in pages])
    
    # Extract structured data
    data = chain_extract.invoke({
        "context": texto_completo,
        "format_instructions": parser.get_format_instructions()
    })
    
    resultados.append(data)

# Convert to DataFrame for analysis
df_talent = pd.DataFrame(resultados)
print(df_talent[['nombre', 'universidad', 'tipo_perfil', 'stack_principal']])

Real-World Example

Here’s an actual extracted profile from the notebook execution:
{
  "nombre": "FERNANDA PAREDES",
  "email": "[email protected]",
  "ubicacion": "Lima, Perú",
  "universidad": "UTP",
  "carrera": "Ingeniería de Software",
  "ciclo_actual": "9no Ciclo",
  "stack_principal": ["Python", "PowerBI", "Java", "Spring Boot"],
  "proyectos_destacados": [
    "Primer puesto en Hackathon universitaria desarrollando app de reciclaje"
  ],
  "tipo_perfil": "Data",
  "potencial_contratacion": "Fernanda es una candidata fuerte para Data Analyst Trainee. Su victoria en Hackathon demuestra capacidad de ejecución bajo presión, y su stack (Python, PowerBI) es coherente con análisis de datos."
}

Extraction Schema

Learn how to configure the JsonOutputParser for CV extraction

Talent Mining Guide

Step-by-step guide for batch CV processing

Design Philosophy

For student recruitment, academic projects and hackathon achievements are stronger indicators of technical capability and learning agility than months of work experience. A student who won first place in a university hackathon demonstrates problem-solving, teamwork, and execution skills.
This field forces the LLM to synthesize its understanding of the candidate into a hiring justification, providing explainability for recruitment decisions. It goes beyond data extraction to reasoning.
The original project targets Latin American university students (Peru), so Spanish field names maintain cultural context and reduce translation errors in CV parsing.

Build docs developers (and LLMs) love