Overview
The RAG Recruitment Assistant requires configuration of several components: Google Generative AI (Gemini), HuggingFace embeddings, and FAISS vector store.
Environment Variables
Google API Key
The system uses Google’s Gemini model, which requires an API key:
Windows
macOS/Linux
Python (.env file)
setx GOOGLE_API_KEY "your_api_key_here"
Validation
The setup cell validates the API key before proceeding:
import os
if not os.getenv( "GOOGLE_API_KEY" ):
raise ValueError ( "You must configure the GOOGLE_API_KEY environment variable" )
print ( "✓ API Key configured" )
LLM Configuration
Gemini 1.5 Flash Setup
The project uses Gemini 1.5 Flash for fast, cost-effective inference:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(
model = "gemini-1.5-flash" ,
temperature = 0
)
print ( "LLM configured successfully." )
Model Parameters
model
string
default: "gemini-1.5-flash"
required
The Gemini model to use. Options:
gemini-1.5-flash: Fast, cost-effective (recommended)
gemini-1.5-pro: More capable but slower
gemini-pro: Previous generation
Controls randomness in responses.
0: Deterministic, consistent outputs (recommended for data extraction)
0.7: Balanced creativity
1.0: Maximum creativity (use for text generation)
Maximum length of the generated response.
Nucleus sampling parameter. Lower values make output more focused.
Top-k sampling parameter. Limits token selection to top k options.
Advanced Configuration
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(
model = "gemini-1.5-flash" ,
temperature = 0 ,
max_output_tokens = 2048 ,
top_p = 0.95 ,
top_k = 40 ,
# Safety settings (optional)
safety_settings = {
"HARM_CATEGORY_HARASSMENT" : "BLOCK_NONE" ,
"HARM_CATEGORY_HATE_SPEECH" : "BLOCK_NONE" ,
"HARM_CATEGORY_SEXUALLY_EXPLICIT" : "BLOCK_NONE" ,
"HARM_CATEGORY_DANGEROUS_CONTENT" : "BLOCK_NONE" ,
}
)
For production use, keep safety settings at their default (BLOCK_MEDIUM_AND_ABOVE). Only disable for testing with controlled content.
Embeddings Configuration
HuggingFace Embeddings
The system uses sentence-transformers/all-MiniLM-L6-v2 for creating vector embeddings:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
print ( "Embeddings model loaded." )
Model Selection
Recommended for most use cases embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
Dimensions: 384
Speed: Very fast
Quality: Good for general text
Size: ~80MB
Higher quality, slower embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-mpnet-base-v2"
)
Dimensions: 768
Speed: Medium
Quality: Better semantic understanding
Size: ~420MB
For Spanish/multilingual content embeddings = HuggingFaceEmbeddings(
model_name = "intfloat/multilingual-e5-base"
)
Dimensions: 768
Speed: Medium
Quality: Excellent for Spanish
Size: ~1.1GB
Advanced Embeddings Configuration
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2" ,
model_kwargs = {
'device' : 'cpu' # Use 'cuda' for GPU acceleration
},
encode_kwargs = {
'normalize_embeddings' : True , # L2 normalization
'batch_size' : 32 # Batch size for encoding
}
)
GPU Acceleration: If you have a CUDA-compatible GPU, set device='cuda' for 10-50x faster embedding generation.
FAISS Configuration
Basic Setup
FAISS (Facebook AI Similarity Search) is used for vector storage and retrieval:
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
# Load documents
loader = PyPDFLoader( "cv.pdf" )
docs = loader.load()
# Create vector store
vectorstore = FAISS .from_documents(docs, embeddings)
# Save for later use
vectorstore.save_local( "faiss_index" )
# Load existing index
vectorstore = FAISS .load_local(
"faiss_index" ,
embeddings,
allow_dangerous_deserialization = True
)
Retriever Parameters
# Basic retriever
retriever = vectorstore.as_retriever()
# With custom parameters
retriever = vectorstore.as_retriever(
search_type = "similarity" ,
search_kwargs = { "k" : 3 } # Return top 3 results
)
Search Types
Similarity (Default)
MMR (Diversity)
Score Threshold
Returns the k most similar documents: retriever = vectorstore.as_retriever(
search_type = "similarity" ,
search_kwargs = { "k" : 3 }
)
Maximum Marginal Relevance - balances relevance and diversity: retriever = vectorstore.as_retriever(
search_type = "mmr" ,
search_kwargs = {
"k" : 4 ,
"fetch_k" : 10 , # Fetch 10, return 4 most diverse
"lambda_mult" : 0.5 # 0=max diversity, 1=max relevance
}
)
Only return results above a similarity threshold: retriever = vectorstore.as_retriever(
search_type = "similarity_score_threshold" ,
search_kwargs = {
"score_threshold" : 0.8 , # Only high-confidence matches
"k" : 5
}
)
from langchain_community.vectorstores import FAISS
import faiss
# Create FAISS index with custom parameters
index = faiss.IndexFlatL2( 384 ) # 384 = embedding dimension
# For larger datasets, use IVF (Inverted File Index)
nlist = 100 # Number of clusters
quantizer = faiss.IndexFlatL2( 384 )
index = faiss.IndexIVFFlat(quantizer, 384 , nlist)
# Train the index (required for IVF)
# index.train(training_vectors)
# Use with LangChain
vectorstore = FAISS(
embedding_function = embeddings.embed_query,
index = index,
docstore = docstore,
index_to_docstore_id = index_to_docstore_id
)
Complete Setup Code
Full Configuration
Minimal Setup
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings
print ( "Installing libraries and configuring..." )
# 1. API Key Validation
if not os.getenv( "GOOGLE_API_KEY" ):
raise ValueError ( "You must configure the GOOGLE_API_KEY environment variable" )
print ( "✓ API Key configured" )
# 2. LLM Configuration
llm = ChatGoogleGenerativeAI(
model = "gemini-1.5-flash" ,
temperature = 0
)
print ( "✓ LLM configured (gemini-1.5-flash)" )
# 3. Embeddings Configuration
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
print ( "✓ Embeddings model loaded" )
print ( "Configuration complete: All ready." )
Dependencies
Install all required packages:
pip install -r requirements.txt
requirements.txt
langchain
langchain-community
langchain-google-genai
langchain-huggingface
sentence-transformers
faiss-cpu
pypdf
reportlab
pandas==2.2.2
matplotlib
plotly
For GPU support, replace faiss-cpu with faiss-gpu in requirements.txt
Google Colab
Local (Python 3.10/3.11)
Docker
The project was developed and tested in Google Colab: # Install packages (run once per session)
! pip install - q langchain langchain - google - genai langchain - huggingface \
langchain - community sentence - transformers faiss - cpu \
pypdf reportlab pandas matplotlib plotly
# Set API key
from google.colab import userdata
os.environ[ "GOOGLE_API_KEY" ] = userdata.get( 'GOOGLE_API_KEY' )
For local execution: # Create virtual environment
python3.11 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export GOOGLE_API_KEY = "your_key_here"
# Run Jupyter
jupyter notebook
Run in Docker container: FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV GOOGLE_API_KEY= ""
CMD [ "jupyter" , "notebook" , "--ip=0.0.0.0" , "--allow-root" ]
docker build -t rag-recruitment .
docker run -p 8888:8888 -e GOOGLE_API_KEY="your_key" rag-recruitment
Troubleshooting
Error: ValueError: You must configure the GOOGLE_API_KEY environment variableSolution: # Check if variable is set
echo $GOOGLE_API_KEY
# Set it
export GOOGLE_API_KEY = "your_key_here"
# Verify in Python
import os
print ( os.getenv ( "GOOGLE_API_KEY" ))
Error: ModuleNotFoundError: No module named 'faiss'Solution: # Install FAISS
pip install faiss-cpu
# Or for GPU:
pip install faiss-gpu
Embeddings Model Download Fails
Error: OSError: Can't load tokenizer for 'sentence-transformers/...'Solution: # Pre-download the model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer( 'sentence-transformers/all-MiniLM-L6-v2' )
# Then use with LangChain
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
Error: RuntimeError: CUDA out of memory or system freezesSolution: # Use CPU instead of GPU
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2" ,
model_kwargs = { 'device' : 'cpu' }
)
# Reduce batch size
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2" ,
encode_kwargs = { 'batch_size' : 8 } # Default is 32
)
Best Practices
Temperature Setting Use temperature=0 for:
Structured data extraction
Consistent outputs
Classification tasks
Use temperature=0.7-1.0 for:
Creative writing
Varied responses
Brainstorming
Model Selection Use gemini-1.5-flash for:
High-volume processing
Cost-sensitive applications
Fast response times
Use gemini-1.5-pro for:
Complex reasoning
Long context (100k+ tokens)
Higher accuracy requirements
Embeddings Cache Save FAISS index to disk: vectorstore.save_local( "index" )
Reuse instead of recreating: vectorstore = FAISS .load_local(
"index" , embeddings,
allow_dangerous_deserialization = True
)
Environment Variables Use .env files for development: from dotenv import load_dotenv
load_dotenv()
Use system variables for production: export GOOGLE_API_KEY = "..."
Next Steps
CV Generation Generate synthetic student CVs
Profile Analysis Query individual CVs with RAG
Talent Mining Batch process multiple CVs