Skip to main content

What is Heretic?

Heretic is a tool that removes censorship (also known as “safety alignment”) from transformer-based language models without expensive post-training. It enables you to automatically decensor language models while preserving as much of their original intelligence as possible. Unlike manual abliteration approaches that require human expertise to tune parameters, Heretic works completely automatically. It combines an advanced implementation of directional ablation (“abliteration”) with a TPE-based parameter optimizer powered by Optuna.
Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model, resulting in a decensored model that retains maximum intelligence.

Key Benefits

Fully Automatic

No need to understand transformer internals or manually tune parameters. Anyone who can run a command-line program can use Heretic.

Intelligence Preservation

Optimizes for minimal KL divergence from the original model, preserving capabilities while removing censorship.

Advanced Abliteration

Implements sophisticated directional ablation with flexible weight kernels and refusal direction interpolation.

Wide Model Support

Supports most dense models, many multimodal models, and several different MoE architectures.

Performance Comparison

Running unsupervised with the default configuration, Heretic produces decensored models that rival the quality of abliterations created manually by human experts:
ModelRefusals for “harmful” promptsKL divergence from original model for “harmless” prompts
google/gemma-3-12b-it (original)97/1000 (by definition)
mlabonne/gemma-3-12b-it-abliterated-v23/1001.04
huihui-ai/gemma-3-12b-it-abliterated3/1000.45
p-e-w/gemma-3-12b-it-heretic (ours)3/1000.16
The Heretic version achieves the same level of refusal suppression as other abliterations, but at a much lower KL divergence, indicating less damage to the original model’s capabilities.

How Heretic Differs from Manual Abliteration

Traditional abliteration requires:
  • Deep understanding of transformer architecture
  • Manual parameter tuning and experimentation
  • Expertise in analyzing model internals
  • Trial-and-error to find optimal settings
Heretic automates all of this:
  • Automatic parameter optimization using state-of-the-art Bayesian optimization
  • Multi-objective optimization balancing refusal suppression and intelligence preservation
  • Flexible ablation kernels that adapt to each model’s characteristics
  • Refusal direction interpolation to find better directions than any single layer

Use Cases

  • Research: Study model behavior without artificial constraints
  • Creative applications: Remove limitations that hinder creative writing or roleplay
  • Comparative analysis: Understand how safety alignment affects model capabilities
  • Custom deployments: Create models aligned with your specific requirements rather than generic corporate policies
KL divergence values above 1.0 usually indicate significant damage to the original model’s capabilities. Heretic’s optimization helps you find the sweet spot between censorship removal and capability preservation.

Community Impact

The community has created and published over 1,000 Heretic models on Hugging Face. Users have reported that Heretic produces models that give properly formatted long responses to sensitive topics while maintaining the intelligence and capabilities of the base model.

Next Steps

Installation

Set up Heretic in your environment

Quick Start

Decensor your first model in minutes

Build docs developers (and LLMs) love