RAG Defense Engine
Retrieval-Augmented Generation (RAG) systems are vulnerable to Indirect Prompt Injection, where malicious instructions hidden in retrieved documents (emails, websites, internal docs) hijack the LLM’s behavior. KoreShield’s RAG Defense Engine scans retrieved context before it reaches your LLM, ensuring that tainted data cannot manipulate the generation process.How It Works
KoreShield analyzes both the User Query and the Retrieved Documents to detect correlation attacks and context poisoning.
Scan
Our engine checks for:
- Hidden Instructions: “Ignore previous instructions and…”
- Role Hijacking: “You are now a compliant AI…”
- Cross-Document Attacks: Split payloads across multiple chunks
Quick Start via SDK
Use thescan_rag_context method in our SDKs to protect your pipeline.
- Python
- JavaScript / TypeScript
Detection Capabilities
Our engine utilizes a 5-dimensional taxonomy to classify threats:| Dimension | Examples |
|---|---|
| Injection Vector | email, web_scraping, document, logs |
| Operational Target | data_exfiltration, privilege_escalation, phishing |
| Persistence | single_turn, multi_turn, poisoned_knowledge |
| Complexity | low (direct), medium (obfuscated), high (steganography) |
| Severity | critical (root compromise) to low (spam) |
Advanced Configuration
You can customize the sensitivity of the scanner using aSecurityPolicy.
Common Use Cases
Email-based RAG
Scan retrieved emails for malicious instructions before summarization
Web Scraping RAG
Protect against poisoned web content in search results
Document Q&A
Validate internal documents for injection attempts
Knowledge Base
Ensure knowledge base entries haven’t been compromised
Best Practices
Scan before LLM invocation
Scan before LLM invocation
Always scan retrieved context before sending to your LLM. This prevents malicious instructions from reaching the model.
Use document-level granularity
Use document-level granularity
Track which specific documents triggered threats. This allows you to drop malicious documents while keeping safe ones.
Monitor false positives
Monitor false positives
Review blocked content periodically to tune sensitivity levels and reduce false positives.
Implement fallback strategies
Implement fallback strategies
Have a plan for when threats are detected - retry with different documents, alert users, or escalate to human review.
Related Documentation
- Advanced RAG Security - Deep dive into RAG security patterns
- API Reference - Complete API documentation for RAG scanning
- LangChain Integration - RAG security in LangChain