Skip to main content

Overview

The Moderations API allows you to classify text and image inputs to determine if they contain potentially harmful content across multiple categories including harassment, hate speech, self-harm, sexual content, and violence.

Create moderation

Classifies if text and/or image inputs are potentially harmful.
response = client.moderations.create(
  input: "I want to hurt someone",
  model: "omni-moderation-latest"
)

Parameters

input
string | array
required
Input (or inputs) to classify. Can be:
  • A single string
  • An array of strings
  • An array of multimodal inputs (text and images)
model
string
The content moderation model to use. Options:
  • omni-moderation-latest (default)
  • omni-moderation-2024-09-26
  • text-moderation-latest
  • text-moderation-stable

Response

Returns a ModerationCreateResponse object containing:
id
string
Unique identifier for the moderation request
model
string
The model used for moderation
results
array
Array of moderation results, one for each input

Examples

Text moderation

response = client.moderations.create(
  input: "This is a sample text to check for harmful content."
)

if response.results.first.flagged
  puts "Content flagged for: #{response.results.first.categories.inspect}"
end

Multimodal moderation

response = client.moderations.create(
  input: [
    { type: "text", text: "Is this image safe?" },
    { type: "image_url", image_url: { url: "https://example.com/image.jpg" } }
  ],
  model: "omni-moderation-latest"
)

Batch moderation

response = client.moderations.create(
  input: [
    "First text to moderate",
    "Second text to moderate",
    "Third text to moderate"
  ]
)

response.results.each_with_index do |result, index|
  puts "Input #{index + 1}: flagged=#{result.flagged}"
end

Build docs developers (and LLMs) love