Overview
Secure MCP Gateway provides comprehensive guardrail protection for both input requests and output responses. Guardrails validate content before it reaches MCP servers and after responses are received, protecting against various security threats and policy violations.
Architecture
The guardrail system follows a plugin-based architecture with three main components:
GuardrailProvider : Abstract base class for all guardrail implementations
InputGuardrail : Validates requests before sending to MCP servers
OutputGuardrail : Validates responses after receiving from MCP servers
PIIHandler : Specialized interface for PII detection and redaction
Violation Types
The gateway detects and blocks various violation types:
PII Detection Detects personally identifiable information in requests
Injection Attacks Prevents prompt injection, SQL injection, and command injection
Toxic Content Filters hate speech, abuse, and harmful language
NSFW Content Blocks inappropriate or explicit content
Policy Violations Enforces custom organizational policies
Bias Detection Identifies biased or discriminatory content
Output Violations
All input violations plus:
Relevancy Validates response relevance to the original request
Adherence Ensures response follows instructions and policies
Hallucination Detects AI-generated false information
GuardrailProvider Interface
All guardrail providers implement the GuardrailProvider abstract base class:
src/secure_mcp_gateway/plugins/guardrails/base.py
from abc import ABC , abstractmethod
from typing import Dict, Any, Optional, List
class GuardrailProvider ( ABC ):
@abstractmethod
def get_name ( self ) -> str :
"""Get provider name (e.g., 'enkrypt', 'openai')"""
pass
@abstractmethod
def get_version ( self ) -> str :
"""Get provider version"""
pass
@abstractmethod
def create_input_guardrail (
self , config : Dict[ str , Any]
) -> Optional[InputGuardrail]:
"""Create input guardrail instance"""
pass
@abstractmethod
def create_output_guardrail (
self , config : Dict[ str , Any]
) -> Optional[OutputGuardrail]:
"""Create output guardrail instance"""
pass
def create_pii_handler (
self , config : Dict[ str , Any]
) -> Optional[PIIHandler]:
"""Create PII handler (optional)"""
return None
Built-in Providers
Enkrypt Provider
Production-grade guardrails powered by Enkrypt AI’s API:
src/secure_mcp_gateway/plugins/guardrails/enkrypt_provider.py
class EnkryptInputGuardrail :
def __init__ ( self , config : Dict[ str , Any], api_key : str , base_url : str ):
self .api_key = api_key
self .base_url = base_url
self .policy_name = config.get( "policy_name" , "" )
self .block_list = config.get( "block" , [])
self .guardrail_url = f " { base_url } /guardrails/policy/detect"
async def validate ( self , request : GuardrailRequest) -> GuardrailResponse:
# Prepare API request
payload = { "text" : request.content}
headers = {
"X-Enkrypt-Policy" : self .policy_name,
"apikey" : self .api_key,
"Content-Type" : "application/json" ,
"X-Enkrypt-Source-Name" : "mcp-gateway" ,
"X-Enkrypt-Source-Event" : "pre-tool"
}
# Call Enkrypt API
async with aiohttp.ClientSession() as session:
async with session.post(
self .guardrail_url, json = payload, headers = headers
) as response:
resp_json = await response.json()
# Parse violations and return result
violations = self ._parse_violations(resp_json)
return GuardrailResponse(
is_safe = not violations,
action = GuardrailAction. BLOCK if violations else GuardrailAction. ALLOW ,
violations = violations
)
OpenAI Moderation Provider
Using OpenAI’s Moderation API:
src/secure_mcp_gateway/plugins/guardrails/example_providers.py
class OpenAIInputGuardrail :
def __init__ ( self , config : Dict[ str , Any]):
self .api_key = config.get( "api_key" , "" )
self .threshold = config.get( "threshold" , 0.7 )
self .block_categories = config.get(
"block_categories" , [ "hate" , "violence" , "sexual" ]
)
async def validate ( self , request : GuardrailRequest) -> GuardrailResponse:
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.openai.com/v1/moderations" ,
headers = { "Authorization" : f "Bearer { self .api_key } " },
json = { "input" : request.content}
)
result = response.json()
moderation_result = result[ "results" ][ 0 ]
violations = []
for category, flagged in moderation_result[ "categories" ].items():
if flagged and category in self .block_categories:
score = moderation_result[ "category_scores" ][category]
if score >= self .threshold:
violations.append(
GuardrailViolation(
violation_type = ViolationType. TOXIC_CONTENT ,
severity = score,
message = f "Content flagged for { category } " ,
action = GuardrailAction. BLOCK ,
metadata = { "category" : category, "score" : score}
)
)
return GuardrailResponse(
is_safe = len (violations) == 0 ,
action = GuardrailAction. ALLOW if not violations else GuardrailAction. BLOCK ,
violations = violations
)
Custom Keyword Provider
Simple keyword-based blocking:
src/secure_mcp_gateway/plugins/guardrails/example_providers.py
class CustomKeywordGuardrail :
def __init__ ( self , config : Dict[ str , Any]):
self .blocked_keywords = config.get( "blocked_keywords" , [])
self .case_sensitive = config.get( "case_sensitive" , False )
async def validate ( self , request : GuardrailRequest) -> GuardrailResponse:
content = request.content
if not self .case_sensitive:
content = content.lower()
blocked_keywords = [kw.lower() for kw in self .blocked_keywords]
else :
blocked_keywords = self .blocked_keywords
violations = []
for keyword in blocked_keywords:
if keyword in content:
violations.append(
GuardrailViolation(
violation_type = ViolationType. KEYWORD_VIOLATION ,
severity = 0.8 ,
message = f "Blocked keyword detected: { keyword } " ,
action = GuardrailAction. BLOCK ,
metadata = { "keyword" : keyword}
)
)
return GuardrailResponse(
is_safe = len (violations) == 0 ,
action = GuardrailAction. ALLOW if not violations else GuardrailAction. BLOCK ,
violations = violations
)
Configuration
Enable Guardrails for a Server
enkrypt_mcp_config.json
CLI
{
"mcp_configs" : {
"config-id" : {
"mcp_config" : [
{
"server_name" : "github_server" ,
"enable_tool_guardrails" : true ,
"input_guardrails_policy" : {
"enabled" : true ,
"policy_name" : "Sample Airline Guardrail" ,
"additional_config" : {
"pii_redaction" : true
},
"block" : [
"policy_violation" ,
"injection_attack" ,
"toxicity" ,
"pii" ,
"nsfw"
]
},
"output_guardrails_policy" : {
"enabled" : true ,
"policy_name" : "Sample Airline Guardrail" ,
"additional_config" : {
"relevancy" : true ,
"hallucination" : true ,
"adherence" : true
},
"block" : [
"policy_violation" ,
"hallucination"
]
}
}
]
}
}
}
{
"plugins" : {
"guardrails" : {
"provider" : "enkrypt" ,
"config" : {
"api_key" : "YOUR_ENKRYPT_API_KEY" ,
"base_url" : "https://api.enkryptai.com"
}
}
}
}
PII Handling
The gateway provides automatic PII redaction and de-anonymization:
Detect PII
Input guardrails detect PII in requests (emails, phone numbers, SSNs, etc.)
Redact PII
PII is replaced with placeholders before sending to MCP server
Store Mapping
Original PII values are stored in a secure mapping with correlation IDs
De-anonymize Response
Output guardrails restore original PII values in responses using the mapping
# Original request
"Contact John at [email protected] or call 555-1234"
# Redacted (sent to MCP server)
"Contact John at [EMAIL_1] or call [PHONE_1]"
# Mapping stored
{
"[EMAIL_1]" : "[email protected] " ,
"[PHONE_1]" : "555-1234"
}
# Response received
"I've sent an email to [EMAIL_1]"
# De-anonymized (returned to client)
"I've sent an email to [email protected] "
Guardrail Actions
When a violation is detected, the gateway can take different actions:
Action Description ALLOWContinue processing (log warning) BLOCKStop processing and return error WARNLog warning but continue MODIFYModify content and continue (PII redaction)
Usage Examples
Test Guardrails with MCP Client
import mcp
# Connect to gateway
client = mcp.Client( "http://localhost:8000/mcp/" )
# This will be blocked by injection attack detection
result = client.call_tool(
"github_server" ,
"search_repositories" ,
{ "query" : "'; DROP TABLE users; --" }
)
# Returns: GuardrailViolation - injection_attack detected
# This will pass guardrails
result = client.call_tool(
"github_server" ,
"search_repositories" ,
{ "query" : "python web frameworks" }
)
# Returns: Normal response from GitHub MCP server
Create Custom Guardrail Provider
from secure_mcp_gateway.plugins.guardrails.base import (
GuardrailProvider,
InputGuardrail,
GuardrailRequest,
GuardrailResponse
)
class MyInputGuardrail :
async def validate ( self , request : GuardrailRequest) -> GuardrailResponse:
# Implement your custom validation logic
if "forbidden" in request.content:
return GuardrailResponse(
is_safe = False ,
action = GuardrailAction. BLOCK ,
violations = [ ... ]
)
return GuardrailResponse( is_safe = True , action = GuardrailAction. ALLOW , violations = [])
class MyGuardrailProvider ( GuardrailProvider ):
def get_name ( self ) -> str :
return "my_custom"
def create_input_guardrail ( self , config ):
return MyInputGuardrail()
Best Practices
Performance Considerations : Guardrails add latency to requests. Use asynchronous guardrails (enkrypt_async_input_guardrails_enabled: true) for high-throughput scenarios.
Policy Management : Create guardrail policies in the Enkrypt Dashboard for easier management and updates.
Testing : Use the included test servers in bad_mcps/ to test your guardrail configurations against various attack vectors.
Metrics and Monitoring
Guardrails emit telemetry for monitoring:
guardrail.input.validation.count - Number of input validations
guardrail.output.validation.count - Number of output validations
guardrail.violations.count - Violations detected by type
guardrail.blocks.count - Requests blocked
guardrail.latency.ms - Validation latency
View metrics in Grafana dashboard or query via Prometheus.