Configuration

Overview

The RAG Recruitment Assistant requires configuration of several components: Google Generative AI (Gemini), HuggingFace embeddings, and FAISS vector store.

Environment Variables

Google API Key

The system uses Google’s Gemini model, which requires an API key:

setx GOOGLE_API_KEY "your_api_key_here"

Get your free Gemini API key at Google AI Studio

Validation

The setup cell validates the API key before proceeding:

import os

if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("You must configure the GOOGLE_API_KEY environment variable")

print("✓ API Key configured")

LLM Configuration

Gemini 1.5 Flash Setup

The project uses Gemini 1.5 Flash for fast, cost-effective inference:

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0
)

print("LLM configured successfully.")

Model Parameters

model

string

default:"gemini-1.5-flash"

required

The Gemini model to use. Options:

gemini-1.5-flash: Fast, cost-effective (recommended)
gemini-1.5-pro: More capable but slower
gemini-pro: Previous generation

temperature

float

default:"0"

required

Controls randomness in responses.

0: Deterministic, consistent outputs (recommended for data extraction)
0.7: Balanced creativity
1.0: Maximum creativity (use for text generation)

max_output_tokens

int

default:"2048"

Maximum length of the generated response.

top_p

float

default:"0.95"

Nucleus sampling parameter. Lower values make output more focused.

top_k

int

default:"40"

Top-k sampling parameter. Limits token selection to top k options.

Advanced Configuration

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_output_tokens=2048,
    top_p=0.95,
    top_k=40,
    # Safety settings (optional)
    safety_settings={
        "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
        "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE",
        "HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE",
        "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE",
    }
)

For production use, keep safety settings at their default (BLOCK_MEDIUM_AND_ABOVE). Only disable for testing with controlled content.

Embeddings Configuration

HuggingFace Embeddings

The system uses sentence-transformers/all-MiniLM-L6-v2 for creating vector embeddings:

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("Embeddings model loaded.")

Model Selection

all-MiniLM-L6-v2 (Default)
all-mpnet-base-v2
multilingual-e5-base

Recommended for most use cases

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Dimensions: 384
Speed: Very fast
Quality: Good for general text
Size: ~80MB

Higher quality, slower

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

Dimensions: 768
Speed: Medium
Quality: Better semantic understanding
Size: ~420MB

For Spanish/multilingual content

embeddings = HuggingFaceEmbeddings(
    model_name="intfloat/multilingual-e5-base"
)

Dimensions: 768
Speed: Medium
Quality: Excellent for Spanish
Size: ~1.1GB

Advanced Embeddings Configuration

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        'device': 'cpu'  # Use 'cuda' for GPU acceleration
    },
    encode_kwargs={
        'normalize_embeddings': True,  # L2 normalization
        'batch_size': 32  # Batch size for encoding
    }
)

GPU Acceleration: If you have a CUDA-compatible GPU, set device='cuda' for 10-50x faster embedding generation.

FAISS Configuration

Basic Setup

FAISS (Facebook AI Similarity Search) is used for vector storage and retrieval:

from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader

# Load documents
loader = PyPDFLoader("cv.pdf")
docs = loader.load()

# Create vector store
vectorstore = FAISS.from_documents(docs, embeddings)

# Save for later use
vectorstore.save_local("faiss_index")

# Load existing index
vectorstore = FAISS.load_local(
    "faiss_index", 
    embeddings,
    allow_dangerous_deserialization=True
)

Retriever Parameters

# Basic retriever
retriever = vectorstore.as_retriever()

# With custom parameters
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 results
)

Search Types

Similarity (Default)
MMR (Diversity)
Score Threshold

Returns the k most similar documents:

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}
)

Maximum Marginal Relevance - balances relevance and diversity:

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 4,
        "fetch_k": 10,  # Fetch 10, return 4 most diverse
        "lambda_mult": 0.5  # 0=max diversity, 1=max relevance
    }
)

Only return results above a similarity threshold:

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "score_threshold": 0.8,  # Only high-confidence matches
        "k": 5
    }
)

Performance Tuning

from langchain_community.vectorstores import FAISS
import faiss

# Create FAISS index with custom parameters
index = faiss.IndexFlatL2(384)  # 384 = embedding dimension

# For larger datasets, use IVF (Inverted File Index)
nlist = 100  # Number of clusters
quantizer = faiss.IndexFlatL2(384)
index = faiss.IndexIVFFlat(quantizer, 384, nlist)

# Train the index (required for IVF)
# index.train(training_vectors)

# Use with LangChain
vectorstore = FAISS(
    embedding_function=embeddings.embed_query,
    index=index,
    docstore=docstore,
    index_to_docstore_id=index_to_docstore_id
)

Complete Setup Code

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings

print("Installing libraries and configuring...")

# 1. API Key Validation
if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("You must configure the GOOGLE_API_KEY environment variable")

print("✓ API Key configured")

# 2. LLM Configuration
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0
)

print("✓ LLM configured (gemini-1.5-flash)")

# 3. Embeddings Configuration
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("✓ Embeddings model loaded")
print("Configuration complete: All ready.")

Dependencies

Install all required packages:

pip install -r requirements.txt

requirements.txt

langchain
langchain-community
langchain-google-genai
langchain-huggingface
sentence-transformers
faiss-cpu
pypdf
reportlab
pandas==2.2.2
matplotlib
plotly

For GPU support, replace faiss-cpu with faiss-gpu in requirements.txt

Platform-Specific Notes

Google Colab
Local (Python 3.10/3.11)
Docker

The project was developed and tested in Google Colab:

# Install packages (run once per session)
!pip install -q langchain langchain-google-genai langchain-huggingface \
                 langchain-community sentence-transformers faiss-cpu \
                 pypdf reportlab pandas matplotlib plotly

# Set API key
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

For local execution:

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export GOOGLE_API_KEY="your_key_here"

# Run Jupyter
jupyter notebook

Run in Docker container:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV GOOGLE_API_KEY=""

CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--allow-root"]

docker build -t rag-recruitment .
docker run -p 8888:8888 -e GOOGLE_API_KEY="your_key" rag-recruitment

Troubleshooting

API Key Not Found

Error: ValueError: You must configure the GOOGLE_API_KEY environment variableSolution:

# Check if variable is set
echo $GOOGLE_API_KEY

# Set it
export GOOGLE_API_KEY="your_key_here"

# Verify in Python
import os
print(os.getenv("GOOGLE_API_KEY"))

FAISS Import Error

Error: ModuleNotFoundError: No module named 'faiss'Solution:

# Install FAISS
pip install faiss-cpu

# Or for GPU:
pip install faiss-gpu

Embeddings Model Download Fails

Error: OSError: Can't load tokenizer for 'sentence-transformers/...'Solution:

# Pre-download the model
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Then use with LangChain
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Out of Memory (OOM)

Error: RuntimeError: CUDA out of memory or system freezesSolution:

# Use CPU instead of GPU
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

# Reduce batch size
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    encode_kwargs={'batch_size': 8}  # Default is 32
)

Best Practices

Temperature Setting

Use temperature=0 for:

Structured data extraction
Consistent outputs
Classification tasks

Use temperature=0.7-1.0 for:

Creative writing
Varied responses
Brainstorming

Model Selection

Use gemini-1.5-flash for:

High-volume processing
Cost-sensitive applications
Fast response times

Use gemini-1.5-pro for:

Complex reasoning
Long context (100k+ tokens)
Higher accuracy requirements

Embeddings Cache

Save FAISS index to disk:

vectorstore.save_local("index")

Reuse instead of recreating:

vectorstore = FAISS.load_local(
    "index", embeddings,
    allow_dangerous_deserialization=True
)

Environment Variables

Use .env files for development:

from dotenv import load_dotenv
load_dotenv()

Use system variables for production:

export GOOGLE_API_KEY="..."

Next Steps

CV Generation

Generate synthetic student CVs

Profile Analysis

Query individual CVs with RAG

Talent Mining

Batch process multiple CVs

Get Started

Core Concepts

Guides

Overview

Environment Variables

Google API Key

Validation

LLM Configuration

Gemini 1.5 Flash Setup

Model Parameters

Advanced Configuration

Embeddings Configuration

HuggingFace Embeddings

Model Selection

Advanced Embeddings Configuration

FAISS Configuration

Basic Setup

Retriever Parameters

Search Types

Performance Tuning

Complete Setup Code

Dependencies

requirements.txt

Platform-Specific Notes

Troubleshooting

Best Practices

Temperature Setting

Model Selection

Embeddings Cache

Environment Variables

Next Steps

CV Generation

Profile Analysis

Talent Mining

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Environment Variables

​Google API Key

​Validation

​LLM Configuration

​Gemini 1.5 Flash Setup

​Model Parameters

​Advanced Configuration

​Embeddings Configuration

​HuggingFace Embeddings

​Model Selection

​Advanced Embeddings Configuration

​FAISS Configuration

​Basic Setup

​Retriever Parameters

​Search Types

​Performance Tuning

​Complete Setup Code

​Dependencies

​requirements.txt

​Platform-Specific Notes

​Troubleshooting

​Best Practices

Temperature Setting

Model Selection

Embeddings Cache

Environment Variables

​Next Steps

CV Generation

Profile Analysis

Talent Mining

Build docs developers (and LLMs) love

Overview

Environment Variables

Google API Key

Validation

LLM Configuration

Gemini 1.5 Flash Setup

Model Parameters

Advanced Configuration

Embeddings Configuration

HuggingFace Embeddings

Model Selection

Advanced Embeddings Configuration

FAISS Configuration

Basic Setup

Retriever Parameters

Search Types

Performance Tuning

Complete Setup Code

Dependencies

requirements.txt

Platform-Specific Notes

Troubleshooting

Best Practices

Next Steps