System Requirements
Before installing Heretic, ensure your system meets these requirements:Software Requirements
- Python: 3.10 or higher
- PyTorch: 2.2 or higher
- Operating System: Linux, macOS, or Windows with WSL
Hardware Requirements
Heretic supports various accelerators including CUDA GPUs, Apple Metal (MPS), XPU, MLU, SDAA, MUSA, and NPU.
- GPU with at least 24GB VRAM for 8B models
- 32GB+ system RAM
- Multi-GPU setup for larger models
- GPU with 12GB VRAM for 8B models using 4-bit quantization
- 16GB system RAM
Installation Steps
Prepare Python Environment
Ensure you have Python 3.10 or higher installed. Create a virtual environment (recommended):
Install PyTorch
Install PyTorch 2.2+ appropriate for your hardware. Visit pytorch.org for platform-specific instructions.Example for CUDA 12.1:Example for Apple Silicon (MPS):Example for CPU only (slow, not recommended):
Install Heretic
Install Heretic from PyPI:This installs all required dependencies including:
transformers- Model loading and inferenceaccelerate- Multi-GPU support and device managementbitsandbytes- Quantization supportoptuna- Parameter optimizationpeft- LoRA adapter supportdatasets- Prompt dataset loading- And other essential libraries
Optional: Research Dependencies
If you want to use Heretic’s research features for visualizing and analyzing model internals, install the optionalresearch extra:
--plot-residuals- Generate PaCMAP projections of residual vectors--print-residual-geometry- Print detailed geometric analysis of refusal directions
pacmap- Dimensionality reduction for visualizationmatplotlib- Plotting librarygeom-median- Geometric median computationscikit-learn- Clustering metricsnumpy- Numerical operations
Research features are primarily useful for interpretability research and understanding how abliteration works. They are not required for basic model decensoring.
Hardware Optimization
Using Quantization
For systems with limited VRAM, enable 4-bit quantization to reduce memory requirements:config.toml:
Multi-GPU Configuration
Heretic automatically uses all available GPUs via Accelerate’sdevice_map="auto". For manual control, create a config.toml:
Performance Tuning
Heretic automatically benchmarks your system to determine the optimal batch size. On an RTX 3090, decensoring Llama-3.1-8B-Instruct takes about 45 minutes with default settings.Troubleshooting
Out of Memory Errors
- Enable quantization with
--quantization bnb_4bit - Reduce batch size with
--batch-size 1 - Limit maximum batch size with
--max-batch-size 16 - Use a smaller model or add more GPUs
Import Errors
Ensure PyTorch is installed before installing Heretic. Some dependencies require PyTorch to be present during installation.GPU Not Detected
Verify your PyTorch installation supports your GPU:Next Steps
Quick Start Guide
Learn how to decensor your first model
