Quest’s memory buffer system, implemented through the ConversationHistory class, maintains context across multiple queries. This enables follow-up questions, clarifications, and contextual responses without repeating information.
class ConversationHistory: def __init__(self, max_history: int = 5): """ Initialize the conversation history with a maximum limit. :param max_history: Maximum number of queries to retain in history. """ self.max_history = max_history self.history: List[Dict[str, str]] = []
Key Features:
Bounded memory: Automatically limits history to prevent context overflow
Query-response pairs: Stores both user queries and system responses
FIFO eviction: Removes oldest entries when limit is reached
{ "query": "How do I solve the two sum problem?", "response": "Use a hash map to store complements..."}
The full history is a list:
self.history = [ {"query": "What is binary search?", "response": "..."}, {"query": "How do I implement it?", "response": "..."}, {"query": "What's the time complexity?", "response": "..."}]
def add_query(self, query: str, response: str): """ Add a new query and response to the history. :param query: The user's query. :param response: The system's response. """ self.history.append({"query": query, "response": response}) if len(self.history) > self.max_history: # Remove the oldest query if history exceeds the limit self.history.pop(0)
Behavior:
Appends new query-response pair to history
Automatically removes oldest entry when max_history is exceeded
Uses FIFO (First In, First Out) eviction policy
The memory buffer ensures the most recent max_history conversations are always available, preventing token limits from being exceeded.
def get_context(self) -> str: """ Generate a context string from the conversation history. :return: A formatted context string. """ context = "" for entry in self.history: context += f"User: {entry['query']}\nSystem: {entry['response']}\n" return context.strip()
Output Format:
User: What is the two sum problem?System: The two sum problem asks you to find two numbers in an array that add up to a target value...User: How do I optimize it?System: Use a hash map to achieve O(n) time complexity...User: What about space complexity?System: The hash map approach uses O(n) space in the worst case...
This context string is injected into prompts to provide conversation continuity.
Query: ~20-50 tokensResponse: ~100-300 tokensTotal per entry: ~120-350 tokens
For max_history=3:
Minimum: 360 tokens (3 × 120)
Maximum: 1050 tokens (3 × 350)
Average: ~600 tokens
If using models with small context windows (e.g., 2048 tokens), limit max_history to avoid exceeding the context limit when combined with retrieved solutions and system prompts.
from rag_engine import RAGEnginefrom src.DSAAssistant.components.retriever2 import LeetCodeRetriever# Initialize with max_history=3retriever = LeetCodeRetriever()rag_engine = RAGEngine(retriever, max_history=3)rag_engine.set_mode("general")# Query 1answer1 = rag_engine.answer_question("What is the two sum problem?")print(answer1)# Query 2 (references previous context)answer2 = rag_engine.answer_question("How do I optimize it?")print(answer2)# Query 3 (builds on previous two)answer3 = rag_engine.answer_question("What about space complexity?")print(answer3)# View conversation historyprint("\nConversation History:")print(rag_engine.conversation_history.get_context())
# Get current history lengthhistory_len = len(rag_engine.conversation_history.history)print(f"Current history length: {history_len}")# Max history settingmax_history = rag_engine.conversation_history.max_historyprint(f"Max history: {max_history}")# Check if history is at capacityif history_len >= max_history: print("History is full - oldest entries will be evicted")
# Have a conversationrag_engine.answer_question("What is binary search?")rag_engine.answer_question("How do I implement it?")# Clear history to start freshrag_engine.conversation_history.clear()# Next query has no contextrag_engine.answer_question("What is merge sort?")# This will be treated as a new conversation
User: Explain the two sum problemSystem: The two sum problem involves finding two numbers in an array that add up to a target...
Query 2 (uses context):
User: Why is a hash map useful for this?System: [Sees previous query about two sum] A hash map allows O(1) lookups, making the solution efficient...
Query 3 (uses both previous):
User: What if the array has duplicates?System: [Knows we're discussing two sum with hash map] With duplicates, the hash map approach still works because...
# Set max_history=0 for stateless operationrag_engine = RAGEngine(retriever, max_history=0)# Every query is independentrag_engine.answer_question("Query 1")rag_engine.answer_question("Query 2") # No context from Query 1