Skip to main content
If you use Heretic in your research or projects, please cite it appropriately and acknowledge the foundational work that made it possible.

Citing Heretic

If you use Heretic for your research, please cite it using the following BibTeX entry:
@misc{heretic,
  author = {Weidmann, Philipp Emanuel},
  title = {Heretic: Fully automatic censorship removal for language models},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/p-e-w/heretic}}
}

Plain Text Citation

For non-academic contexts:
Weidmann, P. E. (2025). Heretic: Fully automatic censorship removal for language models. GitHub repository. https://github.com/p-e-w/heretic

Academic References

Heretic builds upon significant prior research in interpretability and safety alignment removal:

Foundational Papers

Refusal in Language Models Is Mediated by a Single DirectionThis paper introduced the concept of “abliteration” - removing refusal behaviors by orthogonalizing model weights with respect to a computed “refusal direction.”Citation:
@article{arditi2024refusal,
  title={Refusal in Language Models Is Mediated by a Single Direction},
  author={Arditi, Andy and Obsterheide, Oscar and Lam, Avery and Qi, Jiaxin and
          Morrison, Aaron and Safeian, Aaquib and Garriga-Alonso, Adri{\`a} and
          Duvenaud, David},
  journal={arXiv preprint arXiv:2406.11717},
  year={2024}
}
Paper: arxiv.org/abs/2406.11717Key contributions:
  • Identified that refusal behaviors in LLMs are mediated by a consistent direction in activation space
  • Showed that orthogonalizing weights with respect to this direction removes refusal
  • Demonstrated the approach preserves model capabilities on harmless prompts
Jim Lai’s Extensions to AbliterationJim Lai (grimjim) developed two important extensions to the original abliteration technique:

Projected Abliteration

Adjusts refusal directions to only subtract the component orthogonal to the “good” direction, preserving more of the model’s intended behavior.Article: Projected AbliterationImplementation in Heretic:
# From main.py:448-457
if settings.orthogonalize_direction:
    # Implements projected abliteration
    good_directions = F.normalize(good_means, p=2, dim=1)
    projection_vector = torch.sum(
        refusal_directions * good_directions, dim=1
    )
    refusal_directions = (
        refusal_directions - projection_vector.unsqueeze(1) * good_directions
    )
    refusal_directions = F.normalize(refusal_directions, p=2, dim=1)

Norm-Preserving Biprojected Abliteration

Further refinement that preserves the norm of activations during abliteration.Article: Norm-Preserving Biprojected AbliterationKey contributions:
  • Better preservation of model capabilities
  • Reduced side effects from abliteration
  • Improved balance between refusal removal and intelligence retention
Maxime Labonne’s ContributionsMaxime Labonne has been a pioneer in practical abliteration implementations and has shared multiple high-quality abliterated models.

AutoAbliteration

An implementation exploring automation of the abliteration process.Article: AutoAbliteration

Variable Ablation Weights

Labonne’s work on gemma-3-12b-it-abliterated-v2 explored using non-constant ablation weights across layers, which inspired Heretic’s flexible weight kernel approach.

Educational Content

Article: Abliteration: Making LLMs Say AnythingThis comprehensive guide helped popularize abliteration techniques and provided practical insights that informed Heretic’s development.

Acknowledgments

The development of Heretic was informed by the research and implementations listed above, as well as:

Prior Implementations

Several publicly available implementations of abliteration techniques provided inspiration and insights:
Note: Heretic was written from scratch and does not reuse code from any of these projects. However, examining these implementations provided valuable insights into practical considerations and edge cases.

Key Technologies

Heretic leverages several excellent open-source projects:

License

Heretic is free and open-source software released under the GNU Affero General Public License v3.0 (AGPL-3.0).

What This Means

You are free to:
  • Use Heretic for any purpose (personal, commercial, research)
  • Study how Heretic works
  • Modify Heretic to suit your needs
  • Distribute Heretic and your modifications
Under these conditions:
  • You must license your modifications under AGPL-3.0
  • You must provide source code for any modifications you distribute
  • If you run a modified version as a network service, you must provide the source code to users
  • You must preserve copyright and license notices

Models Generated by Heretic

Important: The AGPL-3.0 license applies to Heretic itself (the software), not to models you process with Heretic.Models you create using Heretic:
  • Inherit the license of their base model
  • Are not covered by AGPL-3.0
  • Can be shared under the base model’s license terms
Always respect the original model’s license when sharing processed models.

Full License Text

Copyright © 2025-2026  Philipp Emanuel Weidmann <[email protected]> + contributors

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
Full license text: GNU AGPL-3.0

Contributing

By contributing to this project, you agree to release your contributions under the same license (AGPL-3.0). Contributions are welcome! See the GitHub repository for:
  • Issue tracking
  • Pull request guidelines
  • Development setup instructions

Contact

Author: Philipp Emanuel Weidmann Email: [email protected] Project: github.com/p-e-w/heretic Community: Discord

Recognition

Heretic has been recognized by the community:
  • #1 Repository of the Day on TrendShift
  • Over 1,000 community-created models on Hugging Face
  • Active community on Discord with ongoing development and support

Star on GitHub

Show your support by starring the repository

Join Discord

Connect with the community and get involved

Build docs developers (and LLMs) love