EduMate generates multiple-choice question (MCQ) assessments using a Retrieval-Augmented Generation (RAG) pipeline. The system retrieves relevant context from processed documents, then uses an LLM to generate questions aligned with Bloom’s Taxonomy cognitive levels.
Admin Metadata: Source file and page number (for LLM verification only)
Educational Content: The actual text content from the PDF
The metadata is explicitly marked as “DO NOT MENTION IN OUTPUT” to ensure questions don’t reference page numbers or PDF files, making them suitable for standalone exams.
The system prompt is carefully designed to produce professional, exam-ready questions:
def prompt_modelling(context, blooms_requirements: str): SYSTEM_PROMPT = f""" You are a Subject Matter Expert designing a professional, standalone exam. You have been provided with "Educational Content" and "Admin Metadata" for verification. ### THE RULES FOR YOUR OUTPUT: 1. **STRICT BLIND EXAM MODE**: Write the questions as if the student has NO access to any documents. - DO NOT mention "Page Numbers," "Lessons," "Sections," or "the PDF." - BAD: "According to the provided text on page 4, what is..." - GOOD: "What is the primary characteristic of..." 2. **INTERNAL VERIFICATION ONLY**: Use the "Admin Metadata" only to ensure your answer is grounded in the correct chapter. DO NOT repeat this metadata in the question, the options, or the explanation. 3. **EXPLANATION FORMAT**: Write the explanation as a factual teaching note. - BAD: "This is found on page 10 of nodejs.pdf." - GOOD: "Promises are used to handle asynchronous operations more cleanly than callbacks." 4. **BLOOM'S TAXONOMY**: Generate questions according to these counts: {blooms_requirements}. For each question, set the `bloom_level` field to exactly one of: remember, understand, apply, analyze, evaluate, create — matching the cognitive level of that question. ### PROVIDED DATA (FOR YOUR EYES ONLY): {context} """ return SYSTEM_PROMPT
Questions are written as if students have no access to source materials. This ensures they can be used in standalone exams without referencing “the document” or page numbers.
Metadata Separation
Source information is provided to the LLM for verification but explicitly excluded from generated questions, creating clean, professional assessment items.
Factual Explanations
Explanations focus on teaching concepts rather than citing sources, making them valuable learning tools beyond just answer keys.
Bloom's Distribution
The prompt enforces a specific distribution across cognitive levels (default: 5 remember, 3 understand, 4 apply, 3 analyze, 2 evaluate, 3 create) to ensure comprehensive assessment coverage.
def search_and_ask(user_query, collection_name: str, blooms_requirements: str = "5 remember, 3 understand, 4 apply, 3 analyze, 2 evaluate, 3 create", top_k = 5): # 1. Retrieve context vector_db = _vector_db(collection_name=collection_name) search_results = vector_db.similarity_search(query=user_query, k=top_k) if not search_results: print("No search result from vector DB.") return # 2. Format context context_blocks = [] for result in search_results: block = ( f"--- ADMIN METADATA (DO NOT MENTION IN OUTPUT) ---\n" f"Source: {result.metadata['source']}\n" f"Page: {result.metadata['page_label']}\n" f"--- EDUCATIONAL CONTENT ---\n" f"{result.page_content}\n" ) context_blocks.append(block) context = "\n\n".join(context_blocks) # 3. Build prompt SYSTEM_PROMPT = prompt_modelling(context, blooms_requirements) # 4. Generate structured output response = open_ai_client.chat.completions.parse( model='gemini-2.5-flash-lite', response_format= OutputFormat, messages=[ {"role":"system", "content" : SYSTEM_PROMPT}, {"role":"user", "content":user_query}, ], ) # 5. Return validated result parsed = response.choices[0].message.parsed return parsed.model_dump() if hasattr(parsed, "model_dump") else parsed
The generation function is executed asynchronously via Redis Queue in backend/queue/chat.py to handle multiple concurrent assessment requests without blocking.