Skip to main content

Overview

The rag_for_qa() method combines retrieval and answer generation into a single pipeline. It retrieves relevant passages and uses them as context for an LLM to generate accurate answers.

Basic Usage

1

Set Up ReMem

Initialize and index your documents:
from remem import ReMem
from remem.utils.config_utils import BaseConfig

config = BaseConfig(
    llm_name="gpt-4o-mini",
    embedding_model_name="nvidia/NV-Embed-v2",
    retrieval_top_k=200,  # Passages to retrieve
    qa_top_k=5  # Passages to feed to LLM
)

remem = ReMem(global_config=config, working_dir="./remem_data")
remem.index(docs)
2

Prepare Questions

Create a list of questions:
queries = [
    "What is the capital of France?",
    "Who invented the telephone?"
]
3

Run RAG for QA

Call rag_for_qa() to get answers:
query_solutions, answers, metadata, retrieval_results, qa_results = remem.rag_for_qa(
    queries=queries
)

# Access answers
for qs in query_solutions:
    print(f"Q: {qs.question}")
    print(f"A: {qs.answer}")
    print(f"Retrieved: {len(qs.docs)} passages\n")

Return Values

The method returns a tuple with 5 elements:
query_solutions, answers, metadata, retrieval_results, qa_results = remem.rag_for_qa(queries)

# 1. List[QuerySolution] - Complete solution objects
for qs in query_solutions:
    qs.question        # str: Original question
    qs.answer          # str: Generated answer
    qs.docs            # List[str]: Retrieved passages
    qs.doc_scores      # List[float]: Relevance scores
    qs.qa_rationale    # str: Full LLM response with reasoning
    qs.metrics         # Dict: Evaluation metrics (if enabled)

# 2. List[str] - Raw LLM response messages
# 3. List[Dict] - Metadata for each query
# 4. Dict - Retrieval evaluation metrics
# 5. Dict - QA evaluation metrics

Complete Example from main.py

main.py
import json
from remem import ReMem
from remem.utils.config_utils import BaseConfig

def get_gold_answers(samples):
    gold_answers = []
    for sample in samples:
        gold_ans = sample.get("answer", sample.get("reference"))
        if isinstance(gold_ans, str):
            gold_ans = [gold_ans]
        gold_answers.append(set(gold_ans))
    return gold_answers

def get_gold_docs(samples, dataset_name):
    gold_docs = []
    for sample in samples:
        if "supporting_facts" in sample:
            gold_title = set([item[0] for item in sample["supporting_facts"]])
            gold_title_and_content_list = [item for item in sample["context"] if item[0] in gold_title]
            gold_doc = [item[0] + "\n" + "".join(item[1]) for item in gold_title_and_content_list]
        gold_docs.append(list(set(gold_doc)))
    return gold_docs

# Load data
corpus = json.load(open("reproduce/dataset/musique_corpus.json", "r"))
samples = json.load(open("reproduce/dataset/musique.json", "r"))

docs = [f"{doc['title']}\n{doc['text']}" for doc in corpus]
queries = [s["question"] for s in samples]
gold_answers = get_gold_answers(samples)
gold_docs = get_gold_docs(samples, "musique")

# Configure
config = BaseConfig(
    llm_name="gpt-4o-mini",
    embedding_model_name="nvidia/NV-Embed-v2",
    dataset="musique",
    retrieval_top_k=200,
    qa_top_k=5,
    do_eval_retrieval=True,
    do_eval_qa=True
)

remem = ReMem(global_config=config, working_dir="./outputs/musique")
remem.index(docs)

# RAG for QA with evaluation
query_solutions, answers, metadata, retrieval_eval, qa_eval = remem.rag_for_qa(
    queries=queries,
    gold_docs=gold_docs,
    gold_answers=gold_answers,
    metrics=("qa_em", "qa_f1", "retrieval_recall")
)

print(f"Retrieval Recall@5: {retrieval_eval['Recall@5']:.4f}")
print(f"QA Exact Match: {qa_eval['ExactMatch']:.4f}")
print(f"QA F1 Score: {qa_eval['F1']:.4f}")

Configuration Options

Retrieval Settings

Control how many passages are retrieved and used:
config = BaseConfig(
    retrieval_top_k=200,  # Total passages to retrieve
    qa_top_k=5,  # Top passages to feed to LLM for reading
    linking_top_k=5  # Entities to link during graph traversal
)

QA Prompt Template

Customize the prompt used for answer generation:
config = BaseConfig(
    qa_prompt_template="rag_qa_musique",  # Use dataset-specific template
    qa_passage_prefix="Wikipedia Title: "  # Prefix for each passage
)

Reader Selection

Choose the QA reader implementation:
config = BaseConfig(
    qa_reader="remem",  # Default: REMem reader
    # qa_reader="tiser"  # Alternative: TISER reader
)

Evaluation Metrics

Enable evaluation with gold standard data:
config = BaseConfig(
    do_eval_retrieval=True,  # Evaluate retrieval quality
    do_eval_qa=True  # Evaluate answer quality
)

query_solutions, _, _, retrieval_eval, qa_eval = remem.rag_for_qa(
    queries=queries,
    gold_docs=gold_docs,  # Required for retrieval eval
    gold_answers=gold_answers,  # Required for QA eval
    metrics=("qa_em", "qa_f1", "retrieval_recall")  # Metrics to compute
)

print(f"Retrieval Results: {retrieval_eval}")
print(f"QA Results: {qa_eval}")

Available Metrics

Retrieval Metrics:
  • retrieval_recall - Recall@k for various k values
  • retrieval_recall_all - All gold docs must be retrieved
  • retrieval_ndcg_any - NDCG@k scores
  • retrieval_recall_locomo - LoCoMo benchmark recall
QA Metrics:
  • qa_em - Exact Match score
  • qa_f1 - Token-level F1 score
  • qa_bleu1 / qa_bleu4 - BLEU scores
  • qa_longmemeval - LongMemEval LLM judge
  • qa_mem0_llm_judge - Mem0 LLM judge
  • qa_evalsuit_llm_judge - EvalSuit LLM judge
  • qa_f1_score_locomo - LoCoMo F1 score

Using Pre-Retrieved Results

Pass QuerySolution objects to skip retrieval:
# First, retrieve passages
retrieval_results = remem.retrieve(queries)

# Then, generate answers using retrieved passages
query_solutions, answers, metadata, _, qa_eval = remem.rag_for_qa(
    queries=retrieval_results,  # Pass QuerySolution objects
    gold_answers=gold_answers
)

LLM Configuration

Separate LLMs

Use different LLMs for extraction and QA:
from remem.llm import _get_llm_class

# Configure different LLMs
config = BaseConfig(
    llm_name="gpt-4o-mini",  # Default LLM
    extract_llm_label="gpt-4o-mini",  # For extraction
    qa_llm_label="gpt-4o"  # For QA (more powerful)
)

# Or pass custom LLM instances
extract_llm = _get_llm_class(config)
qa_llm = _get_llm_class(config)  # Different instance

remem = ReMem(
    global_config=config,
    working_dir="./remem_data",
    extract_llm=extract_llm,
    qa_llm=qa_llm
)

Generation Parameters

config = BaseConfig(
    max_new_tokens=2048,  # Max tokens to generate
    temperature=0,  # Sampling temperature (0 = deterministic)
    num_gen_choices=1,  # Number of completions per query
    seed=42  # Random seed for reproducibility
)

QA Context Formatting

The QA context format depends on your document metadata. REMem automatically formats passages with date and role information if available.
From remem.py:801-841, the context is formatted as:
# With metadata (conversational data)
"""Question: {question} (question date: {date})

Retrieved contexts:
Wikipedia Title: [2024-01-15] user: First message content
Wikipedia Title: [2024-01-15] assistant: Response content
...

Thought: """

# Without metadata (standard documents)
"""Question: {question}

Retrieved contexts:
Wikipedia Title: Passage 1 content
Wikipedia Title: Passage 2 content
...

Thought: """

Saving Results

Results are automatically saved when evaluation is enabled:
config = BaseConfig(
    do_eval_qa=True,
    dataset="musique"
)

query_solutions, _, _, retrieval_eval, qa_eval = remem.rag_for_qa(
    queries=queries,
    gold_answers=gold_answers,
    to_save=True  # Default: saves results to working_dir
)

# Results saved to: {working_dir}/rag_results_{inference_type}.json

Agent-Based QA (Episodic/Temporal Methods)

For episodic and temporal extraction methods, use agent-based reasoning:
config = BaseConfig(
    extract_method="episodic_gist",
    agent_fixed_tools=False,  # Enable full tool selection
    agent_max_steps=5,  # Maximum reasoning steps
    agent_fixed_retrieval_tool="semantic_retrieve"  # or "lexical_retrieve"
)

remem = ReMem(global_config=config)
remem.index(docs)

query_solutions, _, _, _, _ = remem.rag_for_qa(queries)

Next Steps

Evaluation

Deep dive into evaluation metrics

Configuration

Explore all configuration options

Build docs developers (and LLMs) love