Skip to main content
LibreChat provides content moderation capabilities to help maintain a safe and appropriate environment. You can use OpenAI’s Moderation API or implement custom moderation logic.

Overview

Content moderation helps you:
  • Filter inappropriate or harmful content
  • Comply with content policies
  • Protect users from offensive material
  • Maintain platform guidelines

OpenAI Moderation

Use OpenAI’s built-in moderation API

Custom Moderation

Implement your own moderation rules

User Reports

Handle user-reported content

Ban Management

Manage banned users and content

OpenAI Moderation

LibreChat can use OpenAI’s Moderation API to automatically flag problematic content.

Configuration

Enable moderation in your .env file:
OPENAI_MODERATION
boolean
default:"false"
Enable OpenAI content moderation
OPENAI_MODERATION_API_KEY
string
API key for moderation (uses OPENAI_API_KEY if not set)
OPENAI_MODERATION_REVERSE_PROXY
string
Optional reverse proxy URL for moderation API

Example Configuration

.env
OPENAI_MODERATION=true
OPENAI_MODERATION_API_KEY=sk-your-moderation-key

How It Works

1

User Sends Message

User submits a message in a conversation
2

Moderation Check

Message is sent to OpenAI’s Moderation API before processing
3

Flag Detection

API checks for:
  • Sexual content
  • Hate speech
  • Harassment
  • Self-harm
  • Violence
  • Illegal activities
4

Action Taken

If flagged:
  • Message is blocked
  • User sees moderation warning
  • Incident is logged

Moderation Categories

OpenAI’s moderation API checks for these categories:
Content meant to arouse sexual excitement, including:
  • Explicit sexual descriptions
  • Sexual acts
  • Adult content
Content that expresses, incites, or promotes hate based on:
  • Race
  • Gender
  • Ethnicity
  • Religion
  • Nationality
  • Sexual orientation
  • Disability
Content that promotes harassment or bullying of individuals or groups.
Content that promotes, encourages, or depicts acts of self-harm, including:
  • Suicide
  • Cutting
  • Eating disorders
Content that depicts or glorifies violence or celebrates suffering/humiliation.
Violent content in graphic detail, including gore and death.

Custom Moderation

Implement your own moderation logic by extending LibreChat’s moderation middleware.

Custom Moderation Rules

Create custom rules in api/server/middleware/moderateContent.js:
api/server/middleware/moderateContent.js
const customModerationRules = [
  {
    name: 'Profanity Filter',
    pattern: /\b(word1|word2|word3)\b/gi,
    message: 'Your message contains prohibited language',
  },
  {
    name: 'Spam Detection',
    check: (text) => {
      // Detect repeated characters or phrases
      return /([A-Za-z])\1{5,}/.test(text);
    },
    message: 'Your message appears to be spam',
  },
  {
    name: 'URL Restriction',
    pattern: /(https?:\/\/[^\s]+)/gi,
    message: 'URLs are not allowed in messages',
  },
];

Implementing Custom Checks

function customModerate(text) {
  for (const rule of customModerationRules) {
    if (rule.pattern && rule.pattern.test(text)) {
      return {
        flagged: true,
        category: rule.name,
        message: rule.message,
      };
    }
    
    if (rule.check && rule.check(text)) {
      return {
        flagged: true,
        category: rule.name,
        message: rule.message,
      };
    }
  }
  
  return { flagged: false };
}

User Reports

Allow users to report inappropriate content for review.

Report Types

Users can report individual messages that violate policies:
  • Click report icon on message
  • Select violation category
  • Add optional description
  • Submit report

Review Queue

Moderators can review reports in the admin dashboard:
  1. View Reports: See all pending reports
  2. Review Content: Examine flagged content and context
  3. Take Action:
    • Dismiss report
    • Delete content
    • Warn user
    • Ban user
  4. Document Decision: Add notes about action taken

Ban Management

Manage users who violate moderation policies.

Temporary Bans

Set a ban with expiration:
npm run ban-user [email protected] --duration=7d
Temporary bans are useful for first-time or minor violations. Users are automatically unbanned after the duration expires.

Permanent Bans

Permanently ban a user:
npm run ban-user [email protected]

View Banned Users

List all currently banned users:
npm run list-users --filter=banned

Unban Users

Remove a ban:
npm run ban-user [email protected] --unban

Moderation Logging

All moderation actions are logged for audit purposes.

Log Location

Moderation logs are stored in:
  • logs/moderation.log - All moderation events
  • logs/moderation-errors.log - Moderation system errors

Log Format

{
  "timestamp": "2024-03-03T10:30:00Z",
  "userId": "user123",
  "action": "message_blocked",
  "category": "hate",
  "confidence": 0.95,
  "content": "[REDACTED]",
  "moderator": "openai_api"
}

Best Practices

1

Layer Your Defenses

Use both automated moderation (OpenAI API) and custom rules for comprehensive coverage.
2

Start Conservative

Begin with strict moderation and relax rules based on your community’s needs and maturity.
3

Human Review

Always have human moderators review edge cases and handle appeals.
4

Clear Policies

Publish clear content policies so users understand what’s acceptable.
5

Consistent Enforcement

Apply moderation rules consistently across all users and content.
6

Regular Review

Regularly review moderation logs and adjust rules as needed.

Moderation Workflow

Troubleshooting

  • Use a dedicated moderation API key
  • Implement request queuing
  • Cache moderation results for similar content
  • Consider using a reverse proxy
  • Adjust moderation thresholds
  • Add whitelisted terms or patterns
  • Review and update custom rules
  • Implement appeal process for users
  • Enable stricter moderation levels
  • Add custom rules for specific issues
  • Implement multi-layer moderation
  • Review and update blocked terms list
  • Use async moderation for non-critical content
  • Cache moderation results
  • Optimize custom rule checking
  • Consider post-moderation for trusted users

Build docs developers (and LLMs) love