Heretic
Fully automatic censorship removal for language models using directional ablation and parameter optimization. Remove safety alignment from transformer models without expensive post-training.
Get Started
Remove censorship from any language model in minutes.Installation
Install Heretic and set up your environment
Quickstart
Decensor your first model in under 5 minutes
How It Works
Learn about directional ablation and optimization
CLI Reference
Complete command-line interface documentation
Key Features
Fully Automatic
No manual tuning required. Heretic automatically finds optimal abliteration parameters using TPE-based optimization.
Quality Preserved
Achieves best-in-class KL divergence, preserving model intelligence while removing refusals.
Quantization Support
Run on consumer hardware with bitsandbytes 4-bit quantization support.
Built-in Evaluation
Evaluate models with refusal counting and KL divergence metrics out of the box.
Research Tools
Advanced residual vector analysis and PaCMAP projections for interpretability research.
Hugging Face Integration
Seamlessly upload and share your models on Hugging Face Hub.
Proven Results
Heretic produces decensored models that rival manually-created abliterations while preserving more of the original model’s capabilities.Example: Gemma-3-12B-IT
- Refusals: 3/100 (same as manual abliterations)
- KL Divergence: 0.16 (vs. 1.04 for manual methods)
- Result: Same refusal suppression with 85% less damage to model capabilities
Quick Example
What You Can Do
Remove Safety Alignment
Strip censorship from instruction-tuned models without retraining
Optimize Parameters
Automatically find the best ablation weights for your model
Evaluate Models
Measure refusal rates and KL divergence from baseline
Analyze Internals
Visualize residual vectors and refusal directions
Community
GitHub Repository
View source code, report issues, and contribute
Hugging Face Models
Browse 1,000+ community models created with Heretic
