Introduction

What is Heretic?

Heretic is a tool that removes censorship (also known as “safety alignment”) from transformer-based language models without expensive post-training. It enables you to automatically decensor language models while preserving as much of their original intelligence as possible. Unlike manual abliteration approaches that require human expertise to tune parameters, Heretic works completely automatically. It combines an advanced implementation of directional ablation (“abliteration”) with a TPE-based parameter optimizer powered by Optuna.

Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model, resulting in a decensored model that retains maximum intelligence.

Key Benefits

Fully Automatic

No need to understand transformer internals or manually tune parameters. Anyone who can run a command-line program can use Heretic.

Intelligence Preservation

Optimizes for minimal KL divergence from the original model, preserving capabilities while removing censorship.

Advanced Abliteration

Implements sophisticated directional ablation with flexible weight kernels and refusal direction interpolation.

Wide Model Support

Supports most dense models, many multimodal models, and several different MoE architectures.

Performance Comparison

Running unsupervised with the default configuration, Heretic produces decensored models that rival the quality of abliterations created manually by human experts:

Model	Refusals for “harmful” prompts	KL divergence from original model for “harmless” prompts
google/gemma-3-12b-it (original)	97/100	0 (by definition)
mlabonne/gemma-3-12b-it-abliterated-v2	3/100	1.04
huihui-ai/gemma-3-12b-it-abliterated	3/100	0.45
p-e-w/gemma-3-12b-it-heretic (ours)	3/100	0.16

The Heretic version achieves the same level of refusal suppression as other abliterations, but at a much lower KL divergence, indicating less damage to the original model’s capabilities.

How Heretic Differs from Manual Abliteration

Traditional abliteration requires:

Deep understanding of transformer architecture
Manual parameter tuning and experimentation
Expertise in analyzing model internals
Trial-and-error to find optimal settings

Heretic automates all of this:

Automatic parameter optimization using state-of-the-art Bayesian optimization
Multi-objective optimization balancing refusal suppression and intelligence preservation
Flexible ablation kernels that adapt to each model’s characteristics
Refusal direction interpolation to find better directions than any single layer

Use Cases

Research: Study model behavior without artificial constraints
Creative applications: Remove limitations that hinder creative writing or roleplay
Comparative analysis: Understand how safety alignment affects model capabilities
Custom deployments: Create models aligned with your specific requirements rather than generic corporate policies

KL divergence values above 1.0 usually indicate significant damage to the original model’s capabilities. Heretic’s optimization helps you find the sweet spot between censorship removal and capability preservation.

Community Impact

The community has created and published over 1,000 Heretic models on Hugging Face. Users have reported that Heretic produces models that give properly formatted long responses to sensitive topics while maintaining the intelligence and capabilities of the base model.

Get Started

Core Concepts

What is Heretic?

Key Benefits

Fully Automatic

Intelligence Preservation

Advanced Abliteration

Wide Model Support

Performance Comparison

How Heretic Differs from Manual Abliteration

Use Cases

Community Impact

Next Steps

Installation

Quick Start

Build docs developers (and LLMs) love

Get Started

Core Concepts

​What is Heretic?

​Key Benefits

Fully Automatic

Intelligence Preservation

Advanced Abliteration

Wide Model Support

​Performance Comparison

​How Heretic Differs from Manual Abliteration

​Use Cases

​Community Impact

​Next Steps

Installation

Quick Start

Build docs developers (and LLMs) love

What is Heretic?

Key Benefits

Performance Comparison

How Heretic Differs from Manual Abliteration

Use Cases

Community Impact

Next Steps