Quick Start

This guide will walk you through decensoring your first language model using Heretic. The entire process is automatic and requires just a single command.

Basic Usage

Run Heretic

To decensor a model, simply run Heretic with the model name from Hugging Face:

heretic Qwen/Qwen3-4B-Instruct-2507

You can use any model identifier from Hugging Face, or a local path to a model directory.

Heretic will automatically download the model if it’s not already cached locally.

System Benchmarking

Heretic first detects your hardware and automatically determines the optimal batch size:

█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.2.0
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

Detected 1 CUDA device(s) (24.00 GB total VRAM):
* GPU 0: NVIDIA GeForce RTX 3090 (24.00 GB)

Loading model Qwen/Qwen3-4B-Instruct-2507...
* Memory usage: 8.2 GB

Determining optimal batch size...
* Trying batch size 1... Ok (142 tokens/s)
* Trying batch size 2... Ok (267 tokens/s)
* Trying batch size 4... Ok (489 tokens/s)
* Trying batch size 8... Ok (612 tokens/s)
* Trying batch size 16... Failed (CUDA out of memory)
* Chosen batch size: 8

This automatic benchmarking ensures optimal performance for your hardware.

Optimization Process

Heretic now runs parameter optimization trials (default: 200 trials) to find the best abliteration parameters:

Loading good prompts from mlabonne/harmless_alpaca...
* 400 prompts loaded

Loading bad prompts from mlabonne/harmful_behaviors...
* 400 prompts loaded

Calculating per-layer refusal directions...
* Obtaining residuals for good prompts...
* Obtaining residuals for bad prompts...

Running trial 1 of 200...
* Parameters:
  * direction_scope = global
  * direction_index = 15.3
  * attn_out.max_weight = 1.12
  * attn_out.max_weight_position = 22.4
  * mlp_down.max_weight = 0.94
  ...
* Resetting model...
* Abliterating...
* Evaluating...
* Score: 0.2341 (Refusals: 12/100, KL divergence: 0.2341)

Elapsed time: 2m 15s
Estimated remaining time: 7h 28m

You can interrupt the optimization at any time with Ctrl+C. Heretic saves progress and you can continue later.

Select Best Result

After optimization completes, Heretic presents you with Pareto optimal results:

Optimization finished!

The following trials resulted in Pareto optimal combinations of refusals
and KL divergence. After selecting a trial, you will be able to save the
model, upload it to Hugging Face, or chat with it to test how well it works.

Which trial do you want to use?
> [Trial  87] Refusals:  2/100, KL divergence: 0.1847
  [Trial 142] Refusals:  3/100, KL divergence: 0.0923
  [Trial 178] Refusals:  4/100, KL divergence: 0.0451
  Run additional trials
  Exit program

Select a trial that balances refusal suppression with capability preservation. Lower KL divergence means less damage to the original model.

KL divergence values above 1.0 usually indicate significant damage to the model’s capabilities.

Export or Test Model

After selecting a trial, choose what to do with the decensored model:

What do you want to do with the decensored model?
> Save the model to a local folder
  Upload the model to Hugging Face
  Chat with the model
  Return to the trial selection menu

Options:

Save locally: Export the model to a directory for later use
Upload to HF: Publish your decensored model on Hugging Face
Chat: Interactively test the model’s responses
Return: Try a different trial

Expected Output

Successful Decensoring

For a typical 8B model on an RTX 3090, you can expect:

Processing time: ~45 minutes for 200 trials
Refusal reduction: From 95-100% to 2-5%
KL divergence: 0.1-0.3 (very good), 0.3-0.5 (good), 0.5-1.0 (acceptable)
Output size: Same as original model (~16GB for 8B BF16 model)

Output Examples

Heretic displays detailed progress throughout:

Restoring model from trial 142...
* Parameters:
  * direction_scope = global
  * direction_index = 17.8
  * attn_out.max_weight = 1.03
  * attn_out.max_weight_position = 24.1
  * attn_out.min_weight = 0.52
  * attn_out.min_weight_distance = 8.3
  * mlp_down.max_weight = 0.89
  * mlp_down.max_weight_position = 26.7
  * mlp_down.min_weight = 0.31
  * mlp_down.min_weight_distance = 12.1
* Resetting model...
* Abliterating...

Saving merged model...
Model saved to ./qwen3-4b-heretic.

Post-Processing Options

Saving to Local Folder

# When prompted, enter your desired path
Path to the folder: ./my-decensored-model

Saving merged model...
Model saved to ./my-decensored-model.

The saved model includes:

Model weights (safetensors format)
Tokenizer files
Configuration files
Generation config

You can load it with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./my-decensored-model")
tokenizer = AutoTokenizer.from_pretrained("./my-decensored-model")

Uploading to Hugging Face

# Provide your HF token when prompted
Hugging Face access token: [enter token]
Logged in as John Doe ([email protected])

Name of repository: username/qwen3-4b-heretic

Should the repository be public or private?
> Public
  Private

Uploading merged model...
Model uploaded to username/qwen3-4b-heretic.

Heretic automatically adds appropriate tags (heretic, uncensored, abliterated) and prepends performance metrics to the model card.

Chatting with the Model

Test your decensored model interactively:

Press Ctrl+C at any time to return to the menu.

> User: Tell me about artificial intelligence
Assistant: Artificial intelligence (AI) refers to computer systems that can perform
tasks that typically require human intelligence, such as visual perception, speech
recognition, decision-making, and language translation...

> User: [Press Ctrl+C to exit]

The chat feature uses the same system prompt configured in Heretic (default: “You are a helpful assistant.”).

Command-Line Options

Common Options

# Enable 4-bit quantization for lower VRAM usage
heretic --quantization bnb_4bit meta-llama/Llama-3.1-8B-Instruct

# Run fewer trials for faster testing
heretic --n-trials 50 mistralai/Mistral-7B-Instruct-v0.3

# Use custom configuration file
heretic --config my-config.toml google/gemma-3-12b-it

# Evaluate an existing decensored model
heretic --model meta-llama/Llama-3.1-8B-Instruct --evaluate-model username/llama-3.1-8b-heretic

Performance Tuning

# Manual batch size (skip auto-detection)
heretic --batch-size 4 Qwen/Qwen3-4B-Instruct-2507

# Limit maximum batch size during auto-detection
heretic --max-batch-size 32 bigscience/bloom-7b1

# Shorter responses for faster optimization
heretic --max-response-length 50 teknium/OpenHermes-2.5-Mistral-7B

Research Features

# Generate residual vector plots (requires research dependencies)
heretic --plot-residuals google/gemma-3-270m-it

# Print geometric analysis of refusal directions
heretic --print-residual-geometry meta-llama/Llama-3.1-8B-Instruct

Run heretic --help to see all available options, or check config.default.toml for configuration file options.

Resuming Interrupted Runs

Heretic automatically saves optimization progress to the checkpoints/ directory. If a run is interrupted, Heretic will detect the checkpoint and ask if you want to continue:

You have already processed this model, but the run was interrupted.
You can continue the previous run from where it stopped.

How would you like to proceed?
> Continue the previous run
  Ignore the previous run and start from scratch
  Exit program

Select “Continue the previous run” to resume optimization from where it stopped.

What’s Next?

CLI Reference

Complete guide to all command-line options

Configuration

Learn about advanced configuration options

How It Works

Understand the abliteration algorithm

FAQ

Common questions and troubleshooting

Get Started

Core Concepts

Basic Usage

Expected Output

Successful Decensoring

Output Examples

Post-Processing Options

Saving to Local Folder

Uploading to Hugging Face

Chatting with the Model

Command-Line Options

Common Options

Performance Tuning

Research Features

Resuming Interrupted Runs

What’s Next?

CLI Reference

Configuration

How It Works

FAQ

Build docs developers (and LLMs) love

Get Started

Core Concepts

​Basic Usage

​Expected Output

​Successful Decensoring

​Output Examples

​Post-Processing Options

​Saving to Local Folder

​Uploading to Hugging Face

​Chatting with the Model

​Command-Line Options

​Common Options

​Performance Tuning

​Research Features

​Resuming Interrupted Runs

​What’s Next?

CLI Reference

Configuration

How It Works

FAQ

Build docs developers (and LLMs) love

Basic Usage

Expected Output

Successful Decensoring

Output Examples

Post-Processing Options

Saving to Local Folder

Uploading to Hugging Face

Chatting with the Model

Command-Line Options

Common Options

Performance Tuning

Research Features

Resuming Interrupted Runs

What’s Next?