RAG Architecture

What is RAG?

Retrieval-Augmented Generation (RAG) is an AI architecture pattern that combines the power of information retrieval with large language model (LLM) generation. Instead of relying solely on the LLM’s training data, RAG retrieves relevant context from external documents and uses it to generate more accurate, grounded responses.

Think of RAG as giving an AI assistant a filing cabinet of documents it can search through before answering questions, rather than relying purely on memory.

The Three Pillars of RAG

The RAG Recruitment Assistant is built on three core operations:

1. Indexing: Converting Documents to Vectors

The system transforms PDF CVs into searchable vector representations:

# Source: notebook/Talent_Scout_3000x.ipynb
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

# Load CV as document
loader = PyPDFLoader(ruta_archivo)
docs = loader.load()

# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)

Each CV is:

Loaded from PDF format
Chunked into manageable text segments
Embedded into high-dimensional vectors (384 dimensions using HuggingFace)
Indexed in FAISS for fast similarity search

2. Retrieval: Finding Relevant Candidates

When a recruiter asks a question, the system:

# Create retriever from vector store
retriever = vectorstore.as_retriever()

# Example query
pregunta = "¿Qué proyectos destacados tiene este estudiante?"
relevant_docs = retriever.invoke(pregunta)

Converts the query into a vector
Performs semantic similarity search in FAISS
Returns the most relevant CV sections

3. Generation: LLM-Powered Analysis

The retrieved context is fed to Gemini 1.5 Flash for intelligent analysis:

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate

# Initialize Gemini
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0  # Deterministic responses
)

# Define analysis prompt
template = """
Eres un Mentor de Carrera Tecnológica.
Analiza el perfil basándote en el siguiente contexto:
{context}

Pregunta: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

Technology Stack

The RAG Recruitment Assistant leverages a modern, production-ready stack:

LangChain

Orchestration framework connecting all components

FAISS

Facebook AI Similarity Search for vector operations

Gemini 1.5 Flash

Google’s LLM for generation and analysis

HuggingFace

Embeddings using sentence-transformers

Why This Stack?

Component	Purpose	Key Benefit
LangChain	RAG orchestration	Pre-built abstractions for document loaders, vector stores, and chains
FAISS	Vector search engine	Extremely fast similarity search (handles millions of vectors)
Gemini 1.5 Flash	LLM generation	Fast, cost-effective, with strong reasoning capabilities
HuggingFace Embeddings	Text vectorization	Open-source, multilingual support, runs locally

Architecture Flow: CV Analysis Pipeline

Here’s how a complete candidate evaluation flows through the system:

CV Upload

Recruiter uploads a student’s PDF CV to the system

Document Processing

PyPDFLoader extracts text content from the PDF

loader = PyPDFLoader("CV_Estudiante_4_Fernanda_Paredes.pdf")
docs = loader.load()

Vectorization

HuggingFace embeddings convert text into 384-dimensional vectors

Indexing

FAISS stores vectors for fast retrieval

Query Processing

Recruiter asks: “What are this student’s key technical projects?”

Semantic Retrieval

FAISS finds the most relevant CV sections based on vector similarity

LLM Analysis

Gemini analyzes retrieved context and generates structured insights:

Academic projects
Tech stack
Hiring potential

Response Delivery

System returns actionable candidate assessment

RAG Chain: The Complete Pipeline

LangChain’s RunnablePassthrough creates an elegant, composable pipeline:

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Build the RAG chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Execute with a single invocation
respuesta = chain.invoke(
    "¿Qué stack tecnológico domina este estudiante?"
)

Chain Breakdown

Step 1: Input Preparation

{"context": retriever, "question": RunnablePassthrough()}

retriever: Automatically fetches relevant CV sections
RunnablePassthrough(): Forwards the question unchanged

Step 2: Prompt Formatting

| prompt

Injects context and question into the template

Step 3: LLM Generation

| llm

Gemini processes the prompt and generates analysis

Step 4: Output Parsing

| StrOutputParser()

Extracts clean string output from LLM response

From CV Input to Candidate Recommendation

The complete architectural flow in production:

Key Advantages of RAG Architecture

No Retraining Required

Update the knowledge base by simply adding new CVs to the vector store. No need to retrain the LLM.

Transparent Decision-Making

Every recommendation is grounded in specific CV sections that can be traced back and audited.

Scalable to Thousands of CVs

FAISS can efficiently handle millions of vectors with sub-second query times.

Domain-Specific Context

The LLM focuses on recruitment-specific analysis, not general knowledge.

Real Implementation Example

Here’s the actual code that powers the “Interrogating a CV” feature:

# Source: notebook/Talent_Scout_3000x.ipynb (Cell 3)
import random
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. SELECT A RANDOM CV
carpeta_fuente = "cvs_estudiantes_final"
archivos_disponibles = os.listdir(carpeta_fuente)
archivo_elegido = random.choice(archivos_disponibles)
ruta_archivo = f"{carpeta_fuente}/{archivo_elegido}"

print(f"📂 Selected: '{archivo_elegido}'")

# 2. LOAD AND VECTORIZE
loader = PyPDFLoader(ruta_archivo)
docs = loader.load()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()

# 3. DEFINE PROMPT
template = """
You are a Career Mentor and employability expert.
Analyze this student's profile based ONLY on this context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 4. BUILD RAG CHAIN
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 5. EXECUTE QUERY
pregunta = "What projects and tech stack does this student have?"
respuesta = chain.invoke(pregunta)

print(f"🤖 ANALYSIS:\n{respuesta}")

Real Output: The system successfully analyzed Fernanda Paredes’ CV, identifying her:

First place in a university Hackathon (app de reciclaje)
Tech stack: Python, PowerBI, Java, Spring Boot
Profile type: Data Analyst Trainee with strong fullstack foundation

Next Steps

Reverse Matching

Learn how this system prioritizes potential over experience

Vector Search Deep Dive

Explore FAISS and semantic similarity in detail

Get Started

Core Concepts

Guides

What is RAG?

The Three Pillars of RAG

1. Indexing: Converting Documents to Vectors

2. Retrieval: Finding Relevant Candidates

3. Generation: LLM-Powered Analysis

Technology Stack

LangChain

FAISS

Gemini 1.5 Flash

HuggingFace

Why This Stack?

Architecture Flow: CV Analysis Pipeline

RAG Chain: The Complete Pipeline

Chain Breakdown

From CV Input to Candidate Recommendation

Key Advantages of RAG Architecture

Real Implementation Example

Next Steps

Reverse Matching

Vector Search Deep Dive

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​What is RAG?

​The Three Pillars of RAG

​1. Indexing: Converting Documents to Vectors

​2. Retrieval: Finding Relevant Candidates

​3. Generation: LLM-Powered Analysis

​Technology Stack

LangChain

FAISS

Gemini 1.5 Flash

HuggingFace

​Why This Stack?

​Architecture Flow: CV Analysis Pipeline

​RAG Chain: The Complete Pipeline

​Chain Breakdown

​From CV Input to Candidate Recommendation

​Key Advantages of RAG Architecture

​Real Implementation Example

​Next Steps

Reverse Matching

Vector Search Deep Dive

Build docs developers (and LLMs) love

What is RAG?

The Three Pillars of RAG

1. Indexing: Converting Documents to Vectors

2. Retrieval: Finding Relevant Candidates

3. Generation: LLM-Powered Analysis

Technology Stack

Why This Stack?

Architecture Flow: CV Analysis Pipeline

RAG Chain: The Complete Pipeline

Chain Breakdown

From CV Input to Candidate Recommendation

Key Advantages of RAG Architecture

Real Implementation Example

Next Steps