Content Moderation - Studley AI

Overview

Studley AI implements comprehensive content moderation to ensure users create appropriate educational content. The system uses AI-powered policy checks to flag inappropriate topics and maintain a safe learning environment.

Content Moderation System

Policy Check Mechanism

Studley AI uses two-tier policy checking:

AI Policy Check (lib/aiPolicyCheck.ts) - Quick AI-based content screening
Advanced Policy Check (lib/policyCheck.ts) - Comprehensive topic analysis

All generation requests are screened before processing to prevent inappropriate content creation.

Flagged Content Detection

The policy checker identifies:

Inappropriate or offensive topics
Violence or harmful content
Adult or explicit material
Hate speech or discrimination
Dangerous activities or illegal content

Monitoring Interface

Admin Inappropriate Monitor

The admin-inappropriate-monitor.tsx component provides a dashboard for reviewing flagged attempts: Features:

Search by username, topic, or reason
View all inappropriate content attempts
Filter by date range
Review flagging reasons
Track user violations

Monitor Interface

interface InappropriateAttempt {
  id: string
  user_id: string
  username: string
  topic: string
  reason: string  // Why it was flagged
  created_at: string
}

Database Tracking

Currently, inappropriate attempts are logged but not permanently stored in the database. Consider implementing an inappropriate_attempts table for historical tracking.

Recommended Schema

To track moderation events, implement:

CREATE TABLE IF NOT EXISTS moderation_events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL,
  content_type TEXT NOT NULL,  -- 'quiz', 'flashcard', 'study_guide'
  topic TEXT NOT NULL,
  flagged_reason TEXT NOT NULL,
  severity TEXT NOT NULL,  -- 'low', 'medium', 'high', 'critical'
  blocked BOOLEAN DEFAULT true,
  reviewed BOOLEAN DEFAULT false,
  reviewer_id TEXT,
  reviewer_notes TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_moderation_user_id ON moderation_events(user_id);
CREATE INDEX idx_moderation_created_at ON moderation_events(created_at DESC);
CREATE INDEX idx_moderation_severity ON moderation_events(severity);

Policy Enforcement Workflow

1. User Submits Generation Request

User enters a topic for quiz/flashcard/study guide generation.

2. AI Policy Check

First-pass screening using AI:

// Simplified policy check flow
const isAppropriate = await checkContentPolicy(topic)

if (!isAppropriate) {
  // Log the attempt
  await logInappropriateAttempt({
    userId: user.id,
    topic: topic,
    reason: 'Failed AI policy check',
    severity: 'high'
  })
  
  // Return error to user
  return { error: 'This topic violates our content policy' }
}

3. Content Generation Blocked

If flagged:

User receives a policy violation message
Attempt is logged for admin review
Credits are NOT deducted
Generation does NOT proceed

4. Admin Review

Admins can:

View all flagged attempts
Review false positives
Adjust policy sensitivity
Take action on repeat offenders

Feedback System

Users can report issues or provide feedback through the built-in feedback system.

Bug Reports Table

CREATE TABLE IF NOT EXISTS bug_reports (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  preferred_name TEXT NOT NULL,
  severity TEXT NOT NULL CHECK (severity IN ('low', 'medium', 'high', 'critical')),
  error_code TEXT,
  has_error_code BOOLEAN DEFAULT true,
  description TEXT NOT NULL,
  browser_info TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
  status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'in_progress', 'resolved', 'closed')),
  admin_notes TEXT
);

User Feedback Table

CREATE TABLE IF NOT EXISTS user_feedback (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  preferred_name TEXT NOT NULL,
  feedback_type TEXT NOT NULL CHECK (feedback_type IN ('general', 'feature_request', 'improvement', 'other')),
  message TEXT NOT NULL,
  rating INTEGER CHECK (rating >= 1 AND rating <= 5),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
  status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'reviewed', 'implemented', 'closed')),
  admin_notes TEXT
);

Viewing Feedback

Query Bug Reports

-- Get all pending bug reports
SELECT 
  id,
  preferred_name,
  severity,
  description,
  created_at
FROM bug_reports
WHERE status = 'pending'
ORDER BY 
  CASE severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    WHEN 'low' THEN 4
  END,
  created_at DESC;

Query User Feedback

-- Get feedback by type
SELECT 
  feedback_type,
  COUNT(*) as count,
  AVG(rating) as avg_rating
FROM user_feedback
GROUP BY feedback_type
ORDER BY count DESC;

Update Feedback Status

-- Mark bug report as resolved
UPDATE bug_reports
SET 
  status = 'resolved',
  admin_notes = 'Fixed in version 2.1.0'
WHERE id = 'bug-id-here';

-- Mark feedback as reviewed
UPDATE user_feedback
SET 
  status = 'reviewed',
  admin_notes = 'Great suggestion! Added to roadmap.'
WHERE id = 'feedback-id-here';

Quiz Results Monitoring

Track quiz activity and results for quality assurance:

quiz_results Table

CREATE TABLE quiz_results (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID NOT NULL,
  quiz_item_id UUID REFERENCES study_items(id) ON DELETE SET NULL,
  score NUMERIC NOT NULL,
  total_questions INTEGER NOT NULL,
  correct_count INTEGER NOT NULL DEFAULT 0,
  partial_count INTEGER NOT NULL DEFAULT 0,
  incorrect_count INTEGER NOT NULL DEFAULT 0,
  accuracy NUMERIC NOT NULL DEFAULT 0,
  avg_time_per_question NUMERIC NOT NULL DEFAULT 0,
  best_streak INTEGER NOT NULL DEFAULT 0,
  answers JSONB,
  created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW()
);

Monitor quiz quality:

-- Find quizzes with unusually low scores (potential quality issues)
SELECT 
  quiz_item_id,
  COUNT(*) as attempt_count,
  AVG(accuracy) as avg_accuracy,
  AVG(score) as avg_score
FROM quiz_results
GROUP BY quiz_item_id
HAVING AVG(accuracy) < 0.5  -- Less than 50% average accuracy
ORDER BY attempt_count DESC;

Shared Content Moderation

Monitor publicly shared materials:

shared_materials Table

CREATE TABLE IF NOT EXISTS shared_materials (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  material_id UUID NOT NULL REFERENCES study_items(id) ON DELETE CASCADE,
  owner_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  shared_with_user_id UUID REFERENCES users(id) ON DELETE CASCADE,
  public_token TEXT UNIQUE,
  public_access_enabled BOOLEAN DEFAULT false,
  allow_resharing BOOLEAN DEFAULT false,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Query publicly shared content:

-- Get all publicly accessible shared materials
SELECT 
  sm.id,
  sm.public_token,
  u.email as owner_email,
  sm.created_at,
  sm.allow_resharing
FROM shared_materials sm
JOIN users u ON sm.owner_id = u.id
WHERE sm.public_access_enabled = true
ORDER BY sm.created_at DESC;

Rate Limiting

Prevent abuse through rate limiting:

generation_rate_limits Table

CREATE TABLE IF NOT EXISTS generation_rate_limits (
  id BIGSERIAL PRIMARY KEY,
  identifier TEXT NOT NULL,  -- User ID or IP address
  created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
);

Check rate limit violations:

-- Find users exceeding rate limits (more than 50 requests/hour)
SELECT 
  identifier,
  COUNT(*) as request_count,
  MAX(created_at) as last_request
FROM generation_rate_limits
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY identifier
HAVING COUNT(*) > 50
ORDER BY request_count DESC;

Moderation Actions

Warning System

Implement a graduated warning system for policy violations:

1st offense: Warning message
2nd offense: Temporary restriction
3rd offense: Account suspension
Severe violations: Immediate suspension

Temporary Restrictions

Add a restrictions system to user profiles:

-- Add restrictions column to users table
ALTER TABLE users ADD COLUMN restricted_until TIMESTAMP;

-- Restrict user temporarily
UPDATE users
SET restricted_until = NOW() + INTERVAL '7 days'
WHERE id = 'user-id-here';

-- Check if user is restricted
SELECT 
  id,
  email,
  restricted_until,
  restricted_until > NOW() as is_currently_restricted
FROM users
WHERE id = 'user-id-here';

Account Suspension

-- Add suspension status
ALTER TABLE users ADD COLUMN suspended BOOLEAN DEFAULT false;
ALTER TABLE users ADD COLUMN suspension_reason TEXT;

-- Suspend user account
UPDATE users
SET 
  suspended = true,
  suspension_reason = 'Repeated policy violations'
WHERE id = 'user-id-here';

Best Practices

Content Moderation Guidelines:

Review flagged content within 24 hours
Document all moderation decisions
Communicate clearly with users about policy violations
Regularly update policy checks based on new patterns
Maintain consistency in enforcement
Provide appeals process for false positives

False Positive Handling

When legitimate content is incorrectly flagged:

Review the topic and context
Adjust policy check sensitivity if needed
Notify the user
Manually allow the generation
Log the false positive for policy improvement

Reporting Dashboard Metrics

Key metrics to track:

Total flagged attempts per day/week/month
Flagging accuracy rate (true vs. false positives)
Most common violation categories
Repeat offenders count
Average response time to reports

Admin Dashboard

Return to admin dashboard overview

User Management

Manage user accounts and actions

Database Schema

Complete database reference

RLS Policies

Security policy documentation

Administration

Database

​Overview

​Content Moderation System

​Policy Check Mechanism

​Flagged Content Detection

​Monitoring Interface

​Admin Inappropriate Monitor

​Monitor Interface

​Database Tracking

​Recommended Schema

​Policy Enforcement Workflow

​1. User Submits Generation Request

​2. AI Policy Check

​3. Content Generation Blocked

​4. Admin Review

​Feedback System

​Bug Reports Table

​User Feedback Table

​Viewing Feedback

​Query Bug Reports

​Query User Feedback

​Update Feedback Status

​Quiz Results Monitoring

​quiz_results Table

​Shared Content Moderation

​shared_materials Table

​Rate Limiting

​generation_rate_limits Table

​Moderation Actions

​Warning System

​Temporary Restrictions

​Account Suspension

​Best Practices

​False Positive Handling

​Reporting Dashboard Metrics

​Related Documentation

Admin Dashboard

User Management

Database Schema

RLS Policies

Build docs developers (and LLMs) love

Overview

Content Moderation System

Policy Check Mechanism

Flagged Content Detection

Monitoring Interface

Admin Inappropriate Monitor

Monitor Interface

Database Tracking

Recommended Schema

Policy Enforcement Workflow

1. User Submits Generation Request

2. AI Policy Check

3. Content Generation Blocked

4. Admin Review

Feedback System

Bug Reports Table

User Feedback Table

Viewing Feedback

Query Bug Reports

Query User Feedback

Update Feedback Status

Quiz Results Monitoring

quiz_results Table

Shared Content Moderation

shared_materials Table

Rate Limiting

generation_rate_limits Table

Moderation Actions

Warning System

Temporary Restrictions

Account Suspension

Best Practices

False Positive Handling

Reporting Dashboard Metrics

Related Documentation