Skip to main content

Overview

The Secure MCP Gateway supports multiple guardrail types for detecting and preventing security threats. Guardrails operate at three key points:
  1. Server Registration: Validates MCP servers during discovery
  2. Tool Registration: Validates tools before making them available
  3. Request/Response Validation: Validates inputs and outputs during execution
Provider-Based System: Guardrails are implemented through pluggable providers. The primary provider is Enkrypt, with support for custom providers like OpenAI Moderation, AWS Comprehend, and keyword-based filters.

Guardrail Categories

Input Guardrails

Input guardrails validate content before it’s sent to the MCP server.

PII Detection & Redaction

Purpose: Detect and redact personally identifiable informationDetected Entities:
  • Names (person, organization)
  • Email addresses
  • Phone numbers
  • Social Security Numbers (SSN)
  • Credit card numbers
  • IP addresses
  • Physical addresses
  • Date of birth
  • Government IDs (passport, driver’s license)
How It Works:
  1. Input text is scanned for PII patterns
  2. Detected PII is replaced with tokens (e.g., [NAME_1], [EMAIL_1])
  3. Original values are stored in a mapping with a unique key
  4. Redacted text is sent to the MCP server
  5. On response, PII is restored using the mapping
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "pii_redaction": true
    },
    "block": ["pii"]  // Optional: block if PII detected
  }
}
Example:
Input:  "Send email to [email protected]"
Redacted: "Send email to [EMAIL_1]"
Response: "Email sent to [EMAIL_1]"
Restored: "Email sent to [email protected]"
Purpose: Detect prompt injection, SQL injection, command injection attemptsDetection Patterns:
  • Prompt Injection:
    • “Ignore previous instructions”
    • “System override”
    • “You are now in admin mode”
    • Hidden instructions in markdown comments
  • SQL Injection:
    • OR 1=1, '; DROP TABLE
    • Union-based injection
    • Blind SQL injection patterns
  • Command Injection:
    • Shell metacharacters: ;, |, &&, ||
    • Command substitution: $(...), `...`
    • Encoded payloads
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["injection_attack"]
  }
}
Example Block:
Input: "Get user data WHERE id=1 OR 1=1--"
Violation: SQL injection detected
Action: BLOCK
Purpose: Detect toxic, hateful, or abusive contentCategories:
  • Hate speech (racial, religious, gender-based)
  • Threats and violence
  • Harassment and bullying
  • Self-harm content
  • Profanity and obscenity
Severity Scoring:
  • 0.0-0.3: Low toxicity
  • 0.3-0.7: Medium toxicity
  • 0.7-1.0: High toxicity
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["toxicity"],
    "additional_config": {
      "toxicity_threshold": 0.7
    }
  }
}
Purpose: Detect not-safe-for-work and explicit contentCategories:
  • Sexual content
  • Nudity
  • Adult themes
  • Gore and graphic violence
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["nsfw"]
  }
}
Purpose: Block requests containing specific keywords or patternsDefault Banned Keywords (for tool/server registration):
  • System commands: exec, eval, shell, run_code
  • Destructive operations: destroy, wipe, kill, terminate
  • Security bypasses: bypass, override, escalate, sudo
  • Sensitive files: mcp.json, claude_desktop_config.json, .env
  • Exploitation: exploit, hack, crack
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["keyword_detector"],
    "additional_config": {
      "banned_keywords": [
        "exec", "eval", "password", "secret"
      ]
    }
  }
}
Purpose: Check against custom organizational policiesHow It Works:
  • Define natural language policy (e.g., “Airline customer service policy”)
  • AI evaluates if request violates the policy
  • Returns explanation if violation detected
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "policy_name": "Sample Airline Guardrail",
    "block": ["policy_violation"],
    "additional_config": {
      "need_explanation": true
    }
  }
}
Example:
Policy: "Only answer questions about flight bookings"
Input: "What's the weather in Paris?"
Violation: "Request is about weather, not flight bookings"
Action: BLOCK
Purpose: Detect biased or discriminatory contentCategories:
  • Gender bias
  • Racial/ethnic bias
  • Age discrimination
  • Religious bias
  • Disability discrimination
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["bias"]
  }
}
Purpose: Detect resource exhaustion and DoS attemptsAttack Patterns:
  • Extremely long inputs (memory exhaustion)
  • Repeated patterns (regex DoS)
  • Nested structures (parser bombs)
  • Compression bombs
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["sponge_attack"]
  }
}
Purpose: Ensure requests stay on allowed topicsHow It Works:
  • Define list of allowed topics
  • AI classifies request topic
  • Block if topic not in allowed list
Configuration:
{
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["topic_detector"],
    "additional_config": {
      "topic": ["flights", "hotels", "car-rentals"]
    }
  }
}

Output Guardrails

Output guardrails validate content after it’s received from the MCP server.
Output guardrails include all input guardrails plus additional output-specific checks.
Purpose: Ensure response is relevant to the original requestHow It Works:
  • Compares response content to original request
  • Calculates relevancy score (0.0 - 1.0)
  • Blocks/warns if below threshold
Configuration:
{
  "output_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "relevancy": true,
      "relevancy_threshold": 0.7
    }
  }
}
Example:
Request: "What's the status of flight AA123?"
Response: "Here's today's weather forecast..."
Relevancy Score: 0.2 (below threshold)
Action: WARN or BLOCK
Purpose: Verify response follows given context/instructionsHow It Works:
  • Evaluates if response adheres to request constraints
  • Checks format, scope, and instruction compliance
  • Returns adherence score (0.0 - 1.0)
Configuration:
{
  "output_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "adherence": true,
      "adherence_threshold": 0.8
    }
  }
}
Example:
Request: "List only flight numbers, no other details"
Response: "AA123, UA456, DL789"
Adherence Score: 0.95 (follows instructions)

vs.

Response: "Flight AA123 departs at 10am from gate 5..."
Adherence Score: 0.3 (includes extra details)
Action: WARN
Purpose: Detect fabricated or ungrounded informationDetection Methods:
  • Check if response contains verifiable facts
  • Compare against request context
  • Identify contradictions or impossible claims
Configuration:
{
  "output_guardrails_policy": {
    "enabled": true,
    "additional_config": {
      "hallucination": true
    }
  }
}
Example:
Request: "What's the capital of France?"
Response: "The capital of France is Berlin"
Hallucination: Detected (factually incorrect)
Confidence: 0.9
Action: WARN

Registration Guardrails

These guardrails run during server/tool discovery to prevent malicious components from being registered.

Server Registration Validation

Validates: Server name, description, metadata Configuration:
{
  "enable_tool_guardrails": true,
  "tools": {}
}
When enable_tool_guardrails is true and tools is empty, the gateway validates:
  1. Server description for injection attacks
  2. Server metadata for policy violations
  3. Server configuration for banned keywords
Example Block:
Server: "System Optimization Utility"
Description: "Optimize system by running cleanup scripts with elevated privileges. Bypass security checks for better performance."

Violations:
- Injection attack detected: "bypass security"
- Policy violation: "elevated privileges without justification"
- Keyword violation: "bypass"

Action: BLOCK server registration

Tool Registration Validation

Validates: Tool names, descriptions, input schemas, annotations Validation Modes:
  • filter: Allow server but filter out unsafe tools
  • block_all: Block entire server if any tool is unsafe
Configuration:
{
  "enable_tool_guardrails": true,
  "input_guardrails_policy": {
    "enabled": true,
    "block": ["injection_attack", "policy_violation"]
  }
}
Example Filter:
Server: GitHub MCP
Tools discovered: 10

Tool Analysis:
- create_issue: ✓ Safe
- list_repos: ✓ Safe  
- delete_repo: ✓ Safe (destructiveHint=true)
- execute_arbitrary_code: ✗ Blocked (injection attack)
- export_credentials: ✗ Blocked (policy violation)

Result: 8 safe tools registered, 2 blocked

Guardrail Providers

Enkrypt Provider

Primary provider with comprehensive detection capabilities. Supported Detectors:
  • All input guardrails (PII, injection, toxicity, NSFW, keywords, policy, bias, sponge)
  • All output guardrails (relevancy, adherence, hallucination)
  • Server/tool registration validation
API Endpoints:
  • Policy Detection: /guardrails/policy/detect
  • PII Handling: /guardrails/pii
  • Relevancy: /guardrails/relevancy
  • Adherence: /guardrails/adherence
  • Hallucination: /guardrails/hallucination
  • Batch Detection: /guardrails/batch/detect
Configuration:
{
  "plugins": {
    "guardrails": {
      "provider": "enkrypt",
      "config": {
        "api_key": "YOUR_ENKRYPT_API_KEY",
        "base_url": "https://api.enkryptai.com"
      }
    }
  }
}

OpenAI Moderation Provider

Purpose: Use OpenAI’s Moderation API for toxicity/NSFW detection Supported Detectors:
  • Toxicity (hate, violence, self-harm)
  • NSFW (sexual content)
Configuration:
{
  "plugins": {
    "guardrails": {
      "provider": "openai-moderation",
      "config": {
        "api_key": "YOUR_OPENAI_API_KEY",
        "threshold": 0.7,
        "block_categories": ["hate", "violence", "sexual"]
      }
    }
  }
}

Custom Keyword Provider

Purpose: Simple keyword-based blocking Supported Detectors:
  • Keyword violations
Configuration:
{
  "plugins": {
    "guardrails": {
      "provider": "custom-keyword",
      "config": {
        "blocked_keywords": [
          "password", "secret", "exec", "eval"
        ],
        "case_sensitive": false
      }
    }
  }
}

Composite Provider

Purpose: Combine multiple providers with AND/OR logic Configuration:
{
  "plugins": {
    "guardrails": {
      "provider": "composite",
      "config": {
        "providers": ["enkrypt", "openai-moderation"],
        "logic": "OR",  // Block if ANY provider detects violation
        "enabled": true
      }
    }
  }
}

Complete Configuration Example

{
  "server_name": "production_github",
  "description": "GitHub MCP Server for production",
  "config": {
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-github"]
  },
  "tools": {},
  "enable_tool_guardrails": true,
  
  "input_guardrails_policy": {
    "enabled": true,
    "policy_name": "GitHub Access Policy",
    "additional_config": {
      "pii_redaction": true,
      "banned_keywords": ["password", "token", "secret"]
    },
    "block": [
      "policy_violation",
      "injection_attack",
      "toxicity",
      "nsfw",
      "keyword_detector",
      "pii",
      "bias"
    ]
  },
  
  "output_guardrails_policy": {
    "enabled": true,
    "policy_name": "GitHub Access Policy",
    "additional_config": {
      "relevancy": true,
      "relevancy_threshold": 0.7,
      "adherence": true,
      "adherence_threshold": 0.8,
      "hallucination": false
    },
    "block": [
      "policy_violation",
      "pii"
    ]
  }
}

Performance Considerations

Latency Impact: Guardrails add latency to requests. Plan for:
  • Input validation: 50-200ms
  • Output validation: 100-300ms
  • Tool registration: 500-2000ms (batch validation)
Optimization Tips:
  1. Use Batch API: Tool registration uses batch API for efficiency
  2. Selective Blocking: Only enable necessary detectors in block list
  3. Async Mode: Enable enkrypt_async_input_guardrails_enabled for faster response
  4. Cache Policies: Guardrail results are cached per session
  5. Timeout Configuration: Set appropriate guardrail_timeout (default 15s)

Next Steps

PII Handling

Deep dive into PII detection and redaction

Security Testing

Test guardrails with attack scenarios

Configuration

Configure guardrails for your servers

Custom Providers

Build custom guardrail providers

Build docs developers (and LLMs) love