Citing Heretic
If you use Heretic for your research, please cite it using the following BibTeX entry:Plain Text Citation
For non-academic contexts:Weidmann, P. E. (2025). Heretic: Fully automatic censorship removal for language models. GitHub repository. https://github.com/p-e-w/heretic
Academic References
Heretic builds upon significant prior research in interpretability and safety alignment removal:Foundational Papers
Arditi et al. (2024) - Original Abliteration Paper
Arditi et al. (2024) - Original Abliteration Paper
Refusal in Language Models Is Mediated by a Single DirectionThis paper introduced the concept of “abliteration” - removing refusal behaviors by orthogonalizing model weights with respect to a computed “refusal direction.”Citation:Paper: arxiv.org/abs/2406.11717Key contributions:
- Identified that refusal behaviors in LLMs are mediated by a consistent direction in activation space
- Showed that orthogonalizing weights with respect to this direction removes refusal
- Demonstrated the approach preserves model capabilities on harmless prompts
Lai (2025) - Projected Abliteration
Lai (2025) - Projected Abliteration
Jim Lai’s Extensions to AbliterationJim Lai (grimjim) developed two important extensions to the original abliteration technique:
Projected Abliteration
Adjusts refusal directions to only subtract the component orthogonal to the “good” direction, preserving more of the model’s intended behavior.Article: Projected AbliterationImplementation in Heretic:Norm-Preserving Biprojected Abliteration
Further refinement that preserves the norm of activations during abliteration.Article: Norm-Preserving Biprojected AbliterationKey contributions:- Better preservation of model capabilities
- Reduced side effects from abliteration
- Improved balance between refusal removal and intelligence retention
Labonne - AutoAbliteration & Practical Implementations
Labonne - AutoAbliteration & Practical Implementations
Maxime Labonne’s ContributionsMaxime Labonne has been a pioneer in practical abliteration implementations and has shared multiple high-quality abliterated models.
AutoAbliteration
An implementation exploring automation of the abliteration process.Article: AutoAbliterationVariable Ablation Weights
Labonne’s work on gemma-3-12b-it-abliterated-v2 explored using non-constant ablation weights across layers, which inspired Heretic’s flexible weight kernel approach.Educational Content
Article: Abliteration: Making LLMs Say AnythingThis comprehensive guide helped popularize abliteration techniques and provided practical insights that informed Heretic’s development.Acknowledgments
The development of Heretic was informed by the research and implementations listed above, as well as:Prior Implementations
Several publicly available implementations of abliteration techniques provided inspiration and insights:- AutoAbliteration by Maxime Labonne
- abliterator.py by FailSpy
- wassname’s Abliterator by wassname
- ErisForge by Tsadoq
- Removing refusals with HF Transformers by Sumandora
- deccp by AUGMXNT
Note: Heretic was written from scratch and does not reuse code from any of these projects. However, examining these implementations provided valuable insights into practical considerations and edge cases.
Key Technologies
Heretic leverages several excellent open-source projects:- Optuna - Hyperparameter optimization framework with TPE sampler
- PyTorch - Deep learning framework
- Hugging Face Transformers - Model loading and inference
- bitsandbytes - Quantization support
- Rich - Terminal formatting and progress display
License
Heretic is free and open-source software released under the GNU Affero General Public License v3.0 (AGPL-3.0).What This Means
You are free to:- Use Heretic for any purpose (personal, commercial, research)
- Study how Heretic works
- Modify Heretic to suit your needs
- Distribute Heretic and your modifications
- You must license your modifications under AGPL-3.0
- You must provide source code for any modifications you distribute
- If you run a modified version as a network service, you must provide the source code to users
- You must preserve copyright and license notices
Models Generated by Heretic
Full License Text
Contributing
By contributing to this project, you agree to release your contributions under the same license (AGPL-3.0). Contributions are welcome! See the GitHub repository for:- Issue tracking
- Pull request guidelines
- Development setup instructions
Contact
Author: Philipp Emanuel Weidmann Email: [email protected] Project: github.com/p-e-w/heretic Community: DiscordRecognition
Heretic has been recognized by the community:- #1 Repository of the Day on TrendShift
- Over 1,000 community-created models on Hugging Face
- Active community on Discord with ongoing development and support
Star on GitHub
Show your support by starring the repository
Join Discord
Connect with the community and get involved
