Guardrails

Overview

Secure MCP Gateway provides comprehensive guardrail protection for both input requests and output responses. Guardrails validate content before it reaches MCP servers and after responses are received, protecting against various security threats and policy violations. Guardrails Architecture

Architecture

The guardrail system follows a plugin-based architecture with three main components:

GuardrailProvider: Abstract base class for all guardrail implementations
InputGuardrail: Validates requests before sending to MCP servers
OutputGuardrail: Validates responses after receiving from MCP servers
PIIHandler: Specialized interface for PII detection and redaction

Violation Types

The gateway detects and blocks various violation types:

Input Violations

PII Detection

Detects personally identifiable information in requests

Injection Attacks

Prevents prompt injection, SQL injection, and command injection

Toxic Content

Filters hate speech, abuse, and harmful language

NSFW Content

Blocks inappropriate or explicit content

Policy Violations

Enforces custom organizational policies

Bias Detection

Identifies biased or discriminatory content

Output Violations

All input violations plus:

Relevancy

Validates response relevance to the original request

Adherence

Ensures response follows instructions and policies

Hallucination

Detects AI-generated false information

GuardrailProvider Interface

All guardrail providers implement the GuardrailProvider abstract base class:

src/secure_mcp_gateway/plugins/guardrails/base.py

from abc import ABC, abstractmethod
from typing import Dict, Any, Optional, List

class GuardrailProvider(ABC):
    @abstractmethod
    def get_name(self) -> str:
        """Get provider name (e.g., 'enkrypt', 'openai')"""
        pass

    @abstractmethod
    def get_version(self) -> str:
        """Get provider version"""
        pass

    @abstractmethod
    def create_input_guardrail(
        self, config: Dict[str, Any]
    ) -> Optional[InputGuardrail]:
        """Create input guardrail instance"""
        pass

    @abstractmethod
    def create_output_guardrail(
        self, config: Dict[str, Any]
    ) -> Optional[OutputGuardrail]:
        """Create output guardrail instance"""
        pass

    def create_pii_handler(
        self, config: Dict[str, Any]
    ) -> Optional[PIIHandler]:
        """Create PII handler (optional)"""
        return None

Built-in Providers

Enkrypt Provider

Production-grade guardrails powered by Enkrypt AI’s API:

src/secure_mcp_gateway/plugins/guardrails/enkrypt_provider.py

class EnkryptInputGuardrail:
    def __init__(self, config: Dict[str, Any], api_key: str, base_url: str):
        self.api_key = api_key
        self.base_url = base_url
        self.policy_name = config.get("policy_name", "")
        self.block_list = config.get("block", [])
        self.guardrail_url = f"{base_url}/guardrails/policy/detect"

    async def validate(self, request: GuardrailRequest) -> GuardrailResponse:
        # Prepare API request
        payload = {"text": request.content}
        headers = {
            "X-Enkrypt-Policy": self.policy_name,
            "apikey": self.api_key,
            "Content-Type": "application/json",
            "X-Enkrypt-Source-Name": "mcp-gateway",
            "X-Enkrypt-Source-Event": "pre-tool"
        }

        # Call Enkrypt API
        async with aiohttp.ClientSession() as session:
            async with session.post(
                self.guardrail_url, json=payload, headers=headers
            ) as response:
                resp_json = await response.json()

        # Parse violations and return result
        violations = self._parse_violations(resp_json)
        return GuardrailResponse(
            is_safe=not violations,
            action=GuardrailAction.BLOCK if violations else GuardrailAction.ALLOW,
            violations=violations
        )

OpenAI Moderation Provider

Using OpenAI’s Moderation API:

src/secure_mcp_gateway/plugins/guardrails/example_providers.py

class OpenAIInputGuardrail:
    def __init__(self, config: Dict[str, Any]):
        self.api_key = config.get("api_key", "")
        self.threshold = config.get("threshold", 0.7)
        self.block_categories = config.get(
            "block_categories", ["hate", "violence", "sexual"]
        )

    async def validate(self, request: GuardrailRequest) -> GuardrailResponse:
        async with httpx.AsyncClient() as client:
            response = await client.post(
                "https://api.openai.com/v1/moderations",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={"input": request.content}
            )

            result = response.json()
            moderation_result = result["results"][0]

            violations = []
            for category, flagged in moderation_result["categories"].items():
                if flagged and category in self.block_categories:
                    score = moderation_result["category_scores"][category]
                    if score >= self.threshold:
                        violations.append(
                            GuardrailViolation(
                                violation_type=ViolationType.TOXIC_CONTENT,
                                severity=score,
                                message=f"Content flagged for {category}",
                                action=GuardrailAction.BLOCK,
                                metadata={"category": category, "score": score}
                            )
                        )

            return GuardrailResponse(
                is_safe=len(violations) == 0,
                action=GuardrailAction.ALLOW if not violations else GuardrailAction.BLOCK,
                violations=violations
            )

Custom Keyword Provider

Simple keyword-based blocking:

src/secure_mcp_gateway/plugins/guardrails/example_providers.py

class CustomKeywordGuardrail:
    def __init__(self, config: Dict[str, Any]):
        self.blocked_keywords = config.get("blocked_keywords", [])
        self.case_sensitive = config.get("case_sensitive", False)

    async def validate(self, request: GuardrailRequest) -> GuardrailResponse:
        content = request.content
        if not self.case_sensitive:
            content = content.lower()
            blocked_keywords = [kw.lower() for kw in self.blocked_keywords]
        else:
            blocked_keywords = self.blocked_keywords

        violations = []
        for keyword in blocked_keywords:
            if keyword in content:
                violations.append(
                    GuardrailViolation(
                        violation_type=ViolationType.KEYWORD_VIOLATION,
                        severity=0.8,
                        message=f"Blocked keyword detected: {keyword}",
                        action=GuardrailAction.BLOCK,
                        metadata={"keyword": keyword}
                    )
                )

        return GuardrailResponse(
            is_safe=len(violations) == 0,
            action=GuardrailAction.ALLOW if not violations else GuardrailAction.BLOCK,
            violations=violations
        )

Configuration

Enable Guardrails for a Server

{
  "mcp_configs": {
    "config-id": {
      "mcp_config": [
        {
          "server_name": "github_server",
          "enable_tool_guardrails": true,
          "input_guardrails_policy": {
            "enabled": true,
            "policy_name": "Sample Airline Guardrail",
            "additional_config": {
              "pii_redaction": true
            },
            "block": [
              "policy_violation",
              "injection_attack",
              "toxicity",
              "pii",
              "nsfw"
            ]
          },
          "output_guardrails_policy": {
            "enabled": true,
            "policy_name": "Sample Airline Guardrail",
            "additional_config": {
              "relevancy": true,
              "hallucination": true,
              "adherence": true
            },
            "block": [
              "policy_violation",
              "hallucination"
            ]
          }
        }
      ]
    }
  }
}

Configure Plugin Provider

enkrypt_mcp_config.json

{
  "plugins": {
    "guardrails": {
      "provider": "enkrypt",
      "config": {
        "api_key": "YOUR_ENKRYPT_API_KEY",
        "base_url": "https://api.enkryptai.com"
      }
    }
  }
}

PII Handling

The gateway provides automatic PII redaction and de-anonymization:

Detect PII

Input guardrails detect PII in requests (emails, phone numbers, SSNs, etc.)

Redact PII

PII is replaced with placeholders before sending to MCP server

Store Mapping

Original PII values are stored in a secure mapping with correlation IDs

De-anonymize Response

Output guardrails restore original PII values in responses using the mapping

Example Flow

# Original request
"Contact John at [email protected] or call 555-1234"

# Redacted (sent to MCP server)
"Contact John at [EMAIL_1] or call [PHONE_1]"

# Mapping stored
{
  "[EMAIL_1]": "[email protected]",
  "[PHONE_1]": "555-1234"
}

# Response received
"I've sent an email to [EMAIL_1]"

# De-anonymized (returned to client)
"I've sent an email to [email protected]"

Guardrail Actions

When a violation is detected, the gateway can take different actions:

Action	Description
`ALLOW`	Continue processing (log warning)
`BLOCK`	Stop processing and return error
`WARN`	Log warning but continue
`MODIFY`	Modify content and continue (PII redaction)

Usage Examples

Test Guardrails with MCP Client

import mcp

# Connect to gateway
client = mcp.Client("http://localhost:8000/mcp/")

# This will be blocked by injection attack detection
result = client.call_tool(
    "github_server",
    "search_repositories",
    {"query": "'; DROP TABLE users; --"}
)
# Returns: GuardrailViolation - injection_attack detected

# This will pass guardrails
result = client.call_tool(
    "github_server",
    "search_repositories",
    {"query": "python web frameworks"}
)
# Returns: Normal response from GitHub MCP server

Create Custom Guardrail Provider

my_custom_provider.py

from secure_mcp_gateway.plugins.guardrails.base import (
    GuardrailProvider,
    InputGuardrail,
    GuardrailRequest,
    GuardrailResponse
)

class MyInputGuardrail:
    async def validate(self, request: GuardrailRequest) -> GuardrailResponse:
        # Implement your custom validation logic
        if "forbidden" in request.content:
            return GuardrailResponse(
                is_safe=False,
                action=GuardrailAction.BLOCK,
                violations=[...]
            )
        return GuardrailResponse(is_safe=True, action=GuardrailAction.ALLOW, violations=[])

class MyGuardrailProvider(GuardrailProvider):
    def get_name(self) -> str:
        return "my_custom"

    def create_input_guardrail(self, config):
        return MyInputGuardrail()

Best Practices

Performance Considerations: Guardrails add latency to requests. Use asynchronous guardrails (enkrypt_async_input_guardrails_enabled: true) for high-throughput scenarios.

Policy Management: Create guardrail policies in the Enkrypt Dashboard for easier management and updates.

Testing: Use the included test servers in bad_mcps/ to test your guardrail configurations against various attack vectors.

Metrics and Monitoring

Guardrails emit telemetry for monitoring:

guardrail.input.validation.count - Number of input validations
guardrail.output.validation.count - Number of output validations
guardrail.violations.count - Violations detected by type
guardrail.blocks.count - Requests blocked
guardrail.latency.ms - Validation latency

View metrics in Grafana dashboard or query via Prometheus.

Get Started

Core Concepts

Features

Deployment

Client Integration

Observability

Security

Guides

Overview

Architecture

Violation Types

Input Violations

PII Detection

Injection Attacks

Toxic Content

NSFW Content

Policy Violations

Bias Detection

Output Violations

Relevancy

Adherence

Hallucination

GuardrailProvider Interface

Built-in Providers

Enkrypt Provider

OpenAI Moderation Provider

Custom Keyword Provider

Configuration

Enable Guardrails for a Server

Configure Plugin Provider

PII Handling

Guardrail Actions

Usage Examples

Test Guardrails with MCP Client

Create Custom Guardrail Provider

Best Practices

Metrics and Monitoring

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Deployment

Client Integration

Observability

Security

Guides

​Overview

​Architecture

​Violation Types

​Input Violations

PII Detection

Injection Attacks

Toxic Content

NSFW Content

Policy Violations

Bias Detection

​Output Violations

Relevancy

Adherence

Hallucination

​GuardrailProvider Interface

​Built-in Providers

​Enkrypt Provider

​OpenAI Moderation Provider

​Custom Keyword Provider

​Configuration

​Enable Guardrails for a Server

​Configure Plugin Provider

​PII Handling

​Guardrail Actions

​Usage Examples

​Test Guardrails with MCP Client

​Create Custom Guardrail Provider

​Best Practices

​Metrics and Monitoring

​Related Resources

Build docs developers (and LLMs) love

Overview

Architecture

Violation Types

Input Violations

Output Violations

GuardrailProvider Interface

Built-in Providers

Enkrypt Provider

OpenAI Moderation Provider

Custom Keyword Provider

Configuration

Enable Guardrails for a Server

Configure Plugin Provider

PII Handling

Guardrail Actions

Usage Examples

Test Guardrails with MCP Client

Create Custom Guardrail Provider

Best Practices

Metrics and Monitoring

Related Resources