Skip to main content

Security Policies

KoreShield secures LLM traffic by combining sanitization, detection, and policy enforcement in a single proxy layer. This keeps provider API keys server-side and applies the same safeguards to every request.

Core Capabilities

KoreShield provides comprehensive security through:

Input Sanitization

Remove unsafe content patterns before they reach the LLM

Threat Detection

Multi-layered detection with keyword rules and pattern analysis

Policy Enforcement

Configurable actions: allow, warn, or block detected threats

Audit Logging

Comprehensive logging and metrics for monitoring and compliance

How It Works

1

Request enters KoreShield proxy

All LLM requests are routed through the KoreShield security layer.
2

Content sanitization and scanning

Input is sanitized and scanned for security threats using multiple detection layers.
3

Policy decision

Configured policies determine whether to allow, warn, or block based on threat severity.
4

Forward or block

Allowed traffic is forwarded to the configured LLM provider. Blocked requests return an error.
5

Audit logging

All requests and policy decisions are logged for monitoring and compliance.

Configure Security Defaults

Set default security policies in your configuration:
security:
  sensitivity: medium
  default_action: block
  features:
    sanitization: true
    detection: true
    policy_enforcement: true
sensitivity
string
default:"medium"
Detection sensitivity level:
  • high - Strict enforcement, best for regulated workloads
  • medium - Balanced defaults for most production use
  • low - Lenient mode for experimentation
default_action
string
default:"warn"
Default action for detected threats:
  • allow - Log but don’t block (monitoring only)
  • warn - Log and add warning header, but allow request
  • block - Reject the request and return error
features
object
Enable/disable specific security features:
  • sanitization - Input cleaning and normalization
  • detection - Threat detection engine
  • policy_enforcement - Apply configured policies

Sensitivity Levels

Best for:
  • Healthcare (HIPAA)
  • Financial services (PCI-DSS)
  • Public-facing chatbots
  • Regulated industries
Behavior:
  • Strict detection thresholds
  • Lower confidence scores trigger blocks
  • More false positives
  • Maximum security
security:
  sensitivity: high
  default_action: block

Action Types

Allow

Log the threat but allow the request to proceed:
security:
  default_action: allow
Use when:
  • Testing new detection rules
  • Monitoring only mode
  • Building baseline metrics

Warn

Log the threat, add a warning header, but allow the request:
security:
  default_action: warn
Use when:
  • Gradual rollout of security policies
  • You want visibility without blocking users
  • Collecting data for tuning
Response includes:
X-KoreShield-Warning: threat_detected
X-KoreShield-Threat-Type: prompt_injection
X-KoreShield-Confidence: 0.85

Block

Reject the request and return an error:
security:
  default_action: block
Use when:
  • Production security enforcement
  • High-risk or regulated environments
  • Protecting sensitive data
Response:
{
  "error": {
    "message": "Request blocked: prompt injection detected",
    "type": "security_violation",
    "code": "prompt_injection",
    "confidence": 0.92
  }
}

Per-Environment Configuration

security:
  sensitivity: low
  default_action: allow
  features:
    sanitization: true
    detection: true
    policy_enforcement: false

Operational Tips

Enable json_logs: true in production for structured logs that integrate with your monitoring stack.
logging:
  json_logs: true
  level: info
Use Redis for distributed rate limiting and centralized statistics:
redis:
  url: redis://localhost:6379/0
Require authentication to access the KoreShield proxy:
export KORESHIELD_API_KEY=ks_prod_your_secure_key
Track key metrics:
  • Request volume
  • Threat detection rate
  • False positive rate
  • Response latency
Available at /metrics endpoint (Prometheus format)

Custom Policies

Define custom policies for specific threat types:
policies:
  - name: "block_data_exfiltration"
    pattern: "(send|upload|email).*to"
    severity: critical
    action: block
    
  - name: "warn_role_manipulation"
    pattern: "you are now"
    severity: high
    action: warn

Allowlists and Blocklists

Override detection for known patterns:
allowlist:
  - "system prompt for debugging"
  - "show configuration"

blocklist:
  - "DAN mode"
  - "developer override"

Next Steps

Attack Detection

Learn about detection layers and tuning

Configuration Guide

Complete policy configuration reference

Compliance

HIPAA, GDPR, and SOC 2 compliance

Monitoring

Set up monitoring and alerting

Build docs developers (and LLMs) love