Overview
The rag_for_qa() method combines retrieval and answer generation into a single pipeline. It retrieves relevant passages and uses them as context for an LLM to generate accurate answers.
Basic Usage
Set Up ReMem
Initialize and index your documents: from remem import ReMem
from remem.utils.config_utils import BaseConfig
config = BaseConfig(
llm_name = "gpt-4o-mini" ,
embedding_model_name = "nvidia/NV-Embed-v2" ,
retrieval_top_k = 200 , # Passages to retrieve
qa_top_k = 5 # Passages to feed to LLM
)
remem = ReMem( global_config = config, working_dir = "./remem_data" )
remem.index(docs)
Prepare Questions
Create a list of questions: queries = [
"What is the capital of France?" ,
"Who invented the telephone?"
]
Run RAG for QA
Call rag_for_qa() to get answers: query_solutions, answers, metadata, retrieval_results, qa_results = remem.rag_for_qa(
queries = queries
)
# Access answers
for qs in query_solutions:
print ( f "Q: { qs.question } " )
print ( f "A: { qs.answer } " )
print ( f "Retrieved: { len (qs.docs) } passages \n " )
Return Values
The method returns a tuple with 5 elements:
query_solutions, answers, metadata, retrieval_results, qa_results = remem.rag_for_qa(queries)
# 1. List[QuerySolution] - Complete solution objects
for qs in query_solutions:
qs.question # str: Original question
qs.answer # str: Generated answer
qs.docs # List[str]: Retrieved passages
qs.doc_scores # List[float]: Relevance scores
qs.qa_rationale # str: Full LLM response with reasoning
qs.metrics # Dict: Evaluation metrics (if enabled)
# 2. List[str] - Raw LLM response messages
# 3. List[Dict] - Metadata for each query
# 4. Dict - Retrieval evaluation metrics
# 5. Dict - QA evaluation metrics
Complete Example from main.py
import json
from remem import ReMem
from remem.utils.config_utils import BaseConfig
def get_gold_answers ( samples ):
gold_answers = []
for sample in samples:
gold_ans = sample.get( "answer" , sample.get( "reference" ))
if isinstance (gold_ans, str ):
gold_ans = [gold_ans]
gold_answers.append( set (gold_ans))
return gold_answers
def get_gold_docs ( samples , dataset_name ):
gold_docs = []
for sample in samples:
if "supporting_facts" in sample:
gold_title = set ([item[ 0 ] for item in sample[ "supporting_facts" ]])
gold_title_and_content_list = [item for item in sample[ "context" ] if item[ 0 ] in gold_title]
gold_doc = [item[ 0 ] + " \n " + "" .join(item[ 1 ]) for item in gold_title_and_content_list]
gold_docs.append( list ( set (gold_doc)))
return gold_docs
# Load data
corpus = json.load( open ( "reproduce/dataset/musique_corpus.json" , "r" ))
samples = json.load( open ( "reproduce/dataset/musique.json" , "r" ))
docs = [ f " { doc[ 'title' ] } \n { doc[ 'text' ] } " for doc in corpus]
queries = [s[ "question" ] for s in samples]
gold_answers = get_gold_answers(samples)
gold_docs = get_gold_docs(samples, "musique" )
# Configure
config = BaseConfig(
llm_name = "gpt-4o-mini" ,
embedding_model_name = "nvidia/NV-Embed-v2" ,
dataset = "musique" ,
retrieval_top_k = 200 ,
qa_top_k = 5 ,
do_eval_retrieval = True ,
do_eval_qa = True
)
remem = ReMem( global_config = config, working_dir = "./outputs/musique" )
remem.index(docs)
# RAG for QA with evaluation
query_solutions, answers, metadata, retrieval_eval, qa_eval = remem.rag_for_qa(
queries = queries,
gold_docs = gold_docs,
gold_answers = gold_answers,
metrics = ( "qa_em" , "qa_f1" , "retrieval_recall" )
)
print ( f "Retrieval Recall@5: { retrieval_eval[ 'Recall@5' ] :.4f} " )
print ( f "QA Exact Match: { qa_eval[ 'ExactMatch' ] :.4f} " )
print ( f "QA F1 Score: { qa_eval[ 'F1' ] :.4f} " )
Configuration Options
Retrieval Settings
Control how many passages are retrieved and used:
config = BaseConfig(
retrieval_top_k = 200 , # Total passages to retrieve
qa_top_k = 5 , # Top passages to feed to LLM for reading
linking_top_k = 5 # Entities to link during graph traversal
)
QA Prompt Template
Customize the prompt used for answer generation:
config = BaseConfig(
qa_prompt_template = "rag_qa_musique" , # Use dataset-specific template
qa_passage_prefix = "Wikipedia Title: " # Prefix for each passage
)
Reader Selection
Choose the QA reader implementation:
config = BaseConfig(
qa_reader = "remem" , # Default: REMem reader
# qa_reader="tiser" # Alternative: TISER reader
)
Evaluation Metrics
Enable evaluation with gold standard data:
config = BaseConfig(
do_eval_retrieval = True , # Evaluate retrieval quality
do_eval_qa = True # Evaluate answer quality
)
query_solutions, _, _, retrieval_eval, qa_eval = remem.rag_for_qa(
queries = queries,
gold_docs = gold_docs, # Required for retrieval eval
gold_answers = gold_answers, # Required for QA eval
metrics = ( "qa_em" , "qa_f1" , "retrieval_recall" ) # Metrics to compute
)
print ( f "Retrieval Results: { retrieval_eval } " )
print ( f "QA Results: { qa_eval } " )
Available Metrics
Retrieval Metrics:
retrieval_recall - Recall@k for various k values
retrieval_recall_all - All gold docs must be retrieved
retrieval_ndcg_any - NDCG@k scores
retrieval_recall_locomo - LoCoMo benchmark recall
QA Metrics:
qa_em - Exact Match score
qa_f1 - Token-level F1 score
qa_bleu1 / qa_bleu4 - BLEU scores
qa_longmemeval - LongMemEval LLM judge
qa_mem0_llm_judge - Mem0 LLM judge
qa_evalsuit_llm_judge - EvalSuit LLM judge
qa_f1_score_locomo - LoCoMo F1 score
Using Pre-Retrieved Results
Pass QuerySolution objects to skip retrieval:
# First, retrieve passages
retrieval_results = remem.retrieve(queries)
# Then, generate answers using retrieved passages
query_solutions, answers, metadata, _, qa_eval = remem.rag_for_qa(
queries = retrieval_results, # Pass QuerySolution objects
gold_answers = gold_answers
)
LLM Configuration
Separate LLMs
Use different LLMs for extraction and QA:
from remem.llm import _get_llm_class
# Configure different LLMs
config = BaseConfig(
llm_name = "gpt-4o-mini" , # Default LLM
extract_llm_label = "gpt-4o-mini" , # For extraction
qa_llm_label = "gpt-4o" # For QA (more powerful)
)
# Or pass custom LLM instances
extract_llm = _get_llm_class(config)
qa_llm = _get_llm_class(config) # Different instance
remem = ReMem(
global_config = config,
working_dir = "./remem_data" ,
extract_llm = extract_llm,
qa_llm = qa_llm
)
Generation Parameters
config = BaseConfig(
max_new_tokens = 2048 , # Max tokens to generate
temperature = 0 , # Sampling temperature (0 = deterministic)
num_gen_choices = 1 , # Number of completions per query
seed = 42 # Random seed for reproducibility
)
QA Context Formatting
The QA context format depends on your document metadata. REMem automatically formats passages with date and role information if available.
From remem.py:801-841, the context is formatted as:
# With metadata (conversational data)
"""Question: {question} (question date: {date})
Retrieved contexts:
Wikipedia Title: [2024-01-15] user: First message content
Wikipedia Title: [2024-01-15] assistant: Response content
...
Thought: """
# Without metadata (standard documents)
"""Question: {question}
Retrieved contexts:
Wikipedia Title: Passage 1 content
Wikipedia Title: Passage 2 content
...
Thought: """
Saving Results
Results are automatically saved when evaluation is enabled:
config = BaseConfig(
do_eval_qa = True ,
dataset = "musique"
)
query_solutions, _, _, retrieval_eval, qa_eval = remem.rag_for_qa(
queries = queries,
gold_answers = gold_answers,
to_save = True # Default: saves results to working_dir
)
# Results saved to: {working_dir}/rag_results_{inference_type}.json
Agent-Based QA (Episodic/Temporal Methods)
For episodic and temporal extraction methods, use agent-based reasoning:
config = BaseConfig(
extract_method = "episodic_gist" ,
agent_fixed_tools = False , # Enable full tool selection
agent_max_steps = 5 , # Maximum reasoning steps
agent_fixed_retrieval_tool = "semantic_retrieve" # or "lexical_retrieve"
)
remem = ReMem( global_config = config)
remem.index(docs)
query_solutions, _, _, _, _ = remem.rag_for_qa(queries)
Next Steps
Evaluation Deep dive into evaluation metrics
Configuration Explore all configuration options