Overview
The knowledge base currently supports three domains: DBMS, OOPs, and OS. This guide shows you how to add new topics like Networking, Data Structures, Algorithms, or any other computer science domain.Process Overview
- Update taxonomy structure
- Add keyword matching rules
- Prepare raw data files
- Update topic detection logic
- Run preparation and indexing
- Test and validate
Step 1: Update Taxonomy
Editsource/config/taxonomy.json to add your new topic and its subtopics.
Current Structure
Adding a New Topic: Networking
Subtopic Organization Best Practices
- Start broad, then specific: Begin with foundational concepts, then add specialized topics
- Aim for 8-15 subtopics: Too few limits organization; too many creates confusion
- Use clear, standard terminology: Stick to industry-standard names
- Avoid overlap: Each subtopic should be distinct
- Consider difficulty progression: Organize from beginner to advanced when possible
Step 2: Add Keyword Rules
Editsource/config/topic_rules.json to define how questions map to subtopics.
Rule Structure
Example: Networking Rules
Add these rules to the existing array intopic_rules.json:
Keyword Selection Tips
- Include variations: Add singular/plural, abbreviations, full names
- Use 5-15 keywords per rule: More keywords = better matching
- Include domain-specific jargon: Technical terms users will search for
- Test with lowercase: All matching is case-insensitive
- Avoid overly generic terms: “network” alone is too broad
- Include common misspellings: If users commonly misspell terms
Step 3: Prepare Raw Data
Create a JSON file insource/data/raw/ with your questions and answers.
File Naming Convention
Name the file to match your topic for automatic detection:networking_qna.json→ Topic: Networkingdatastructures_qna.json→ Topic: Data Structuresalgorithms_qna.json→ Topic: Algorithms
Data Format
Data Quality Guidelines
- Clear questions: Use natural language questions users might ask
- Comprehensive answers: Provide complete, accurate information
- Consistent formatting: Follow the same structure for all entries
- Unique IDs: Ensure each question has a unique identifier
- Avoid HTML: Or accept that it will be stripped during normalization
Step 4: Update Topic Detection
Editsource/scripts/prepare_kb.py to recognize your new topic.
Modify topic_from_filename()
Find the function around line 67 and add your topic:Add Fallback Subtopic
In theassign_subtopic() function (around line 129), add a fallback:
Optional: Add Refinement Logic
If your topic has overlapping concepts, add disambiguation inrefine_subtopic():
Optional: Update Difficulty Heuristics
Add topic-specific advanced/beginner terms indifficulty_heuristic():
Step 5: Run Preparation and Indexing
After making all changes, rebuild the knowledge base.1. Prepare Data
2. Build FAISS Index
Step 6: Test Topic Classification
Validate Output
Checksource/data/processed/kb_clean.json to ensure questions are correctly classified:
Test Queries
Query the system with topic-specific questions:- “What is the OSI model?”
- “Explain TCP vs UDP”
- “How does DNS resolution work?”
- Questions retrieve relevant context from your new topic
- Subtopics are correctly identified
- Difficulty levels make sense
Check Subtopic Distribution
Verify questions are well-distributed across subtopics:Example: Adding Data Structures
Let’s walk through a complete example of adding Data Structures as a new topic.1. Update taxonomy.json
2. Add rules to topic_rules.json
3. Create datastructures_qna.json
4. Update prepare_kb.py
5. Run and test
Troubleshooting
Issue: All questions assigned to fallback subtopic
Cause: Keywords don’t match question content Solution:- Review your keyword rules - are they too specific?
- Check actual question text for common terms
- Lower the coverage threshold (default 0.25) if needed
Issue: Questions assigned to wrong topic
Cause: Filename not recognized bytopic_from_filename()
Solution:
- Ensure filename contains a recognizable keyword
- Add filename pattern to the detection logic
- Verify the
_source_filefield is set correctly
Issue: Poor subtopic distribution
Cause: Some subtopics have better keywords than others Solution:- Balance keyword rules across all subtopics
- Add more keywords to underrepresented subtopics
- Review questions manually to identify missing keywords
Issue: Overlapping subtopics
Cause: Questions could belong to multiple subtopics Solution:- Add refinement logic in
refine_subtopic() - Use more specific keywords
- Restructure subtopics to be more distinct
Best Practices
- Start small: Add 20-50 questions initially, then expand
- Test incrementally: Run preparation after each major change
- Review classifications: Manually inspect output for accuracy
- Iterate on keywords: Refine rules based on misclassifications
- Document domain knowledge: Add comments explaining non-obvious rules
- Maintain consistency: Follow existing naming conventions
- Version control: Commit changes before major updates
Related Documentation
- KB Preparation - Understanding the preparation pipeline
- FAISS Indexing - Building search indexes