Skip to main content

Overview

The RAG Recruitment Assistant requires configuration of several components: Google Generative AI (Gemini), HuggingFace embeddings, and FAISS vector store.

Environment Variables

Google API Key

The system uses Google’s Gemini model, which requires an API key:
setx GOOGLE_API_KEY "your_api_key_here"
Get your free Gemini API key at Google AI Studio

Validation

The setup cell validates the API key before proceeding:
import os

if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("You must configure the GOOGLE_API_KEY environment variable")

print("✓ API Key configured")

LLM Configuration

Gemini 1.5 Flash Setup

The project uses Gemini 1.5 Flash for fast, cost-effective inference:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0
)

print("LLM configured successfully.")

Model Parameters

model
string
default:"gemini-1.5-flash"
required
The Gemini model to use. Options:
  • gemini-1.5-flash: Fast, cost-effective (recommended)
  • gemini-1.5-pro: More capable but slower
  • gemini-pro: Previous generation
temperature
float
default:"0"
required
Controls randomness in responses.
  • 0: Deterministic, consistent outputs (recommended for data extraction)
  • 0.7: Balanced creativity
  • 1.0: Maximum creativity (use for text generation)
max_output_tokens
int
default:"2048"
Maximum length of the generated response.
top_p
float
default:"0.95"
Nucleus sampling parameter. Lower values make output more focused.
top_k
int
default:"40"
Top-k sampling parameter. Limits token selection to top k options.

Advanced Configuration

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_output_tokens=2048,
    top_p=0.95,
    top_k=40,
    # Safety settings (optional)
    safety_settings={
        "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
        "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE",
        "HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE",
        "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE",
    }
)
For production use, keep safety settings at their default (BLOCK_MEDIUM_AND_ABOVE). Only disable for testing with controlled content.

Embeddings Configuration

HuggingFace Embeddings

The system uses sentence-transformers/all-MiniLM-L6-v2 for creating vector embeddings:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("Embeddings model loaded.")

Model Selection

Recommended for most use cases
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
  • Dimensions: 384
  • Speed: Very fast
  • Quality: Good for general text
  • Size: ~80MB

Advanced Embeddings Configuration

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        'device': 'cpu'  # Use 'cuda' for GPU acceleration
    },
    encode_kwargs={
        'normalize_embeddings': True,  # L2 normalization
        'batch_size': 32  # Batch size for encoding
    }
)
GPU Acceleration: If you have a CUDA-compatible GPU, set device='cuda' for 10-50x faster embedding generation.

FAISS Configuration

Basic Setup

FAISS (Facebook AI Similarity Search) is used for vector storage and retrieval:
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader

# Load documents
loader = PyPDFLoader("cv.pdf")
docs = loader.load()

# Create vector store
vectorstore = FAISS.from_documents(docs, embeddings)

# Save for later use
vectorstore.save_local("faiss_index")

# Load existing index
vectorstore = FAISS.load_local(
    "faiss_index", 
    embeddings,
    allow_dangerous_deserialization=True
)

Retriever Parameters

# Basic retriever
retriever = vectorstore.as_retriever()

# With custom parameters
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 results
)

Search Types

Returns the k most similar documents:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}
)

Performance Tuning

from langchain_community.vectorstores import FAISS
import faiss

# Create FAISS index with custom parameters
index = faiss.IndexFlatL2(384)  # 384 = embedding dimension

# For larger datasets, use IVF (Inverted File Index)
nlist = 100  # Number of clusters
quantizer = faiss.IndexFlatL2(384)
index = faiss.IndexIVFFlat(quantizer, 384, nlist)

# Train the index (required for IVF)
# index.train(training_vectors)

# Use with LangChain
vectorstore = FAISS(
    embedding_function=embeddings.embed_query,
    index=index,
    docstore=docstore,
    index_to_docstore_id=index_to_docstore_id
)

Complete Setup Code

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings

print("Installing libraries and configuring...")

# 1. API Key Validation
if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("You must configure the GOOGLE_API_KEY environment variable")

print("✓ API Key configured")

# 2. LLM Configuration
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0
)

print("✓ LLM configured (gemini-1.5-flash)")

# 3. Embeddings Configuration
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("✓ Embeddings model loaded")
print("Configuration complete: All ready.")

Dependencies

Install all required packages:
pip install -r requirements.txt

requirements.txt

langchain
langchain-community
langchain-google-genai
langchain-huggingface
sentence-transformers
faiss-cpu
pypdf
reportlab
pandas==2.2.2
matplotlib
plotly
For GPU support, replace faiss-cpu with faiss-gpu in requirements.txt

Platform-Specific Notes

The project was developed and tested in Google Colab:
# Install packages (run once per session)
!pip install -q langchain langchain-google-genai langchain-huggingface \
                 langchain-community sentence-transformers faiss-cpu \
                 pypdf reportlab pandas matplotlib plotly

# Set API key
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

Troubleshooting

Error: ValueError: You must configure the GOOGLE_API_KEY environment variableSolution:
# Check if variable is set
echo $GOOGLE_API_KEY

# Set it
export GOOGLE_API_KEY="your_key_here"

# Verify in Python
import os
print(os.getenv("GOOGLE_API_KEY"))
Error: ModuleNotFoundError: No module named 'faiss'Solution:
# Install FAISS
pip install faiss-cpu

# Or for GPU:
pip install faiss-gpu
Error: OSError: Can't load tokenizer for 'sentence-transformers/...'Solution:
# Pre-download the model
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Then use with LangChain
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Error: RuntimeError: CUDA out of memory or system freezesSolution:
# Use CPU instead of GPU
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

# Reduce batch size
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    encode_kwargs={'batch_size': 8}  # Default is 32
)

Best Practices

Temperature Setting

Use temperature=0 for:
  • Structured data extraction
  • Consistent outputs
  • Classification tasks
Use temperature=0.7-1.0 for:
  • Creative writing
  • Varied responses
  • Brainstorming

Model Selection

Use gemini-1.5-flash for:
  • High-volume processing
  • Cost-sensitive applications
  • Fast response times
Use gemini-1.5-pro for:
  • Complex reasoning
  • Long context (100k+ tokens)
  • Higher accuracy requirements

Embeddings Cache

Save FAISS index to disk:
vectorstore.save_local("index")
Reuse instead of recreating:
vectorstore = FAISS.load_local(
    "index", embeddings,
    allow_dangerous_deserialization=True
)

Environment Variables

Use .env files for development:
from dotenv import load_dotenv
load_dotenv()
Use system variables for production:
export GOOGLE_API_KEY="..."

Next Steps

CV Generation

Generate synthetic student CVs

Profile Analysis

Query individual CVs with RAG

Talent Mining

Batch process multiple CVs

Build docs developers (and LLMs) love