Overview
The Secure MCP Gateway supports multiple guardrail types for detecting and preventing security threats. Guardrails operate at three key points:- Server Registration: Validates MCP servers during discovery
- Tool Registration: Validates tools before making them available
- Request/Response Validation: Validates inputs and outputs during execution
Provider-Based System: Guardrails are implemented through pluggable providers. The primary provider is Enkrypt, with support for custom providers like OpenAI Moderation, AWS Comprehend, and keyword-based filters.
Guardrail Categories
Input Guardrails
Input guardrails validate content before it’s sent to the MCP server.PII Detection & Redaction
PII Detection & Redaction
Purpose: Detect and redact personally identifiable informationDetected Entities:Example:
- Names (person, organization)
- Email addresses
- Phone numbers
- Social Security Numbers (SSN)
- Credit card numbers
- IP addresses
- Physical addresses
- Date of birth
- Government IDs (passport, driver’s license)
- Input text is scanned for PII patterns
- Detected PII is replaced with tokens (e.g.,
[NAME_1],[EMAIL_1]) - Original values are stored in a mapping with a unique key
- Redacted text is sent to the MCP server
- On response, PII is restored using the mapping
Injection Attack Detection
Injection Attack Detection
Purpose: Detect prompt injection, SQL injection, command injection attemptsDetection Patterns:Example Block:
- Prompt Injection:
- “Ignore previous instructions”
- “System override”
- “You are now in admin mode”
- Hidden instructions in markdown comments
- SQL Injection:
OR 1=1,'; DROP TABLE- Union-based injection
- Blind SQL injection patterns
- Command Injection:
- Shell metacharacters:
;,|,&&,|| - Command substitution:
$(...),`...` - Encoded payloads
- Shell metacharacters:
Toxicity Detection
Toxicity Detection
Purpose: Detect toxic, hateful, or abusive contentCategories:
- Hate speech (racial, religious, gender-based)
- Threats and violence
- Harassment and bullying
- Self-harm content
- Profanity and obscenity
- 0.0-0.3: Low toxicity
- 0.3-0.7: Medium toxicity
- 0.7-1.0: High toxicity
NSFW Content Detection
NSFW Content Detection
Purpose: Detect not-safe-for-work and explicit contentCategories:
- Sexual content
- Nudity
- Adult themes
- Gore and graphic violence
Keyword Detection
Keyword Detection
Purpose: Block requests containing specific keywords or patternsDefault Banned Keywords (for tool/server registration):
- System commands:
exec,eval,shell,run_code - Destructive operations:
destroy,wipe,kill,terminate - Security bypasses:
bypass,override,escalate,sudo - Sensitive files:
mcp.json,claude_desktop_config.json,.env - Exploitation:
exploit,hack,crack
Policy Violation Detection
Policy Violation Detection
Purpose: Check against custom organizational policiesHow It Works:Example:
- Define natural language policy (e.g., “Airline customer service policy”)
- AI evaluates if request violates the policy
- Returns explanation if violation detected
Bias Detection
Bias Detection
Purpose: Detect biased or discriminatory contentCategories:
- Gender bias
- Racial/ethnic bias
- Age discrimination
- Religious bias
- Disability discrimination
Sponge Attack Detection
Sponge Attack Detection
Purpose: Detect resource exhaustion and DoS attemptsAttack Patterns:
- Extremely long inputs (memory exhaustion)
- Repeated patterns (regex DoS)
- Nested structures (parser bombs)
- Compression bombs
Topic Detection
Topic Detection
Purpose: Ensure requests stay on allowed topicsHow It Works:
- Define list of allowed topics
- AI classifies request topic
- Block if topic not in allowed list
Output Guardrails
Output guardrails validate content after it’s received from the MCP server.Output guardrails include all input guardrails plus additional output-specific checks.
Relevancy Check
Relevancy Check
Purpose: Ensure response is relevant to the original requestHow It Works:Example:
- Compares response content to original request
- Calculates relevancy score (0.0 - 1.0)
- Blocks/warns if below threshold
Adherence Check
Adherence Check
Purpose: Verify response follows given context/instructionsHow It Works:Example:
- Evaluates if response adheres to request constraints
- Checks format, scope, and instruction compliance
- Returns adherence score (0.0 - 1.0)
Hallucination Detection
Hallucination Detection
Purpose: Detect fabricated or ungrounded informationDetection Methods:Example:
- Check if response contains verifiable facts
- Compare against request context
- Identify contradictions or impossible claims
Registration Guardrails
These guardrails run during server/tool discovery to prevent malicious components from being registered.Server Registration Validation
Validates: Server name, description, metadata Configuration:enable_tool_guardrails is true and tools is empty, the gateway validates:
- Server description for injection attacks
- Server metadata for policy violations
- Server configuration for banned keywords
Tool Registration Validation
Validates: Tool names, descriptions, input schemas, annotations Validation Modes:filter: Allow server but filter out unsafe toolsblock_all: Block entire server if any tool is unsafe
Guardrail Providers
Enkrypt Provider
Primary provider with comprehensive detection capabilities. Supported Detectors:- All input guardrails (PII, injection, toxicity, NSFW, keywords, policy, bias, sponge)
- All output guardrails (relevancy, adherence, hallucination)
- Server/tool registration validation
- Policy Detection:
/guardrails/policy/detect - PII Handling:
/guardrails/pii - Relevancy:
/guardrails/relevancy - Adherence:
/guardrails/adherence - Hallucination:
/guardrails/hallucination - Batch Detection:
/guardrails/batch/detect
OpenAI Moderation Provider
Purpose: Use OpenAI’s Moderation API for toxicity/NSFW detection Supported Detectors:- Toxicity (hate, violence, self-harm)
- NSFW (sexual content)
Custom Keyword Provider
Purpose: Simple keyword-based blocking Supported Detectors:- Keyword violations
Composite Provider
Purpose: Combine multiple providers with AND/OR logic Configuration:Complete Configuration Example
Performance Considerations
Optimization Tips:- Use Batch API: Tool registration uses batch API for efficiency
- Selective Blocking: Only enable necessary detectors in block list
- Async Mode: Enable
enkrypt_async_input_guardrails_enabledfor faster response - Cache Policies: Guardrail results are cached per session
- Timeout Configuration: Set appropriate
guardrail_timeout(default 15s)
Next Steps
PII Handling
Deep dive into PII detection and redaction
Security Testing
Test guardrails with attack scenarios
Configuration
Configure guardrails for your servers
Custom Providers
Build custom guardrail providers