Common Issues
Out of Memory (OOM) Errors
Out of Memory (OOM) Errors
Symptoms:Or in This can reduce VRAM usage by approximately 4x compared to full precision loading.By default (batch_size = 0), Heretic automatically determines the optimal batch size, so only set this if you’re experiencing issues.
CUDA out of memoryerrorsRuntimeError: [enforce fail at alloc_cpu.cpp]- System freezes or crashes during processing
Enable Quantization
Use 4-bit quantization to drastically reduce VRAM requirements:config.toml:Reduce Batch Size
If you’ve manually set a batch size, reduce it:Close Other Applications
Free up VRAM by closing other GPU-intensive applications:- Web browsers with hardware acceleration
- Other AI/ML applications
- Games or 3D applications
Use a Smaller Model
If memory is severely constrained, try a smaller variant of your target model:- 7B/8B models instead of 13B
- 13B models instead of 30B/70B
Model Loading Failures
Model Loading Failures
Symptoms:The model loading code (from main.py:76-101) attempts to estimate memory requirements and will warn you if insufficient resources are detected.
ValueError: Unable to load modelOSError: Can't load tokenizerRuntimeError: Error(s) in loading state_dict
Check Model Path
Verify the model name or path is correct:Install Trust Remote Code
Some models require custom code. Heretic enablestrust_remote_code=True by default, but ensure you trust the model source.Check Internet Connection
When loading from Hugging Face:- Ensure you have a stable internet connection
- Large models may take time to download
- Consider downloading the model first with
huggingface-cli
Try Different Dtype
If the model fails to load with the default dtype, Heretic has fallback mechanisms. You can also explicitly set the dtype:Merge Warnings and Quantization Issues
Merge Warnings and Quantization Issues
Symptoms:Memory Requirements:Rule of thumb: You need approximately 3x the parameter count in GB of RAMExamples:This way, merging is not required and the model can be saved directly.
- Warnings about CPU merging and system RAM
[yellow]WARNING: CPU merging requires dequantizing...- System becomes unresponsive during merge
bnb_4bit), the model is loaded in 4-bit precision. To save or upload the final model, it needs to be merged back to full precision, which requires loading the entire base model into system RAM.From the source code (main.py:66-72):- 7B model: ~21GB RAM
- 13B model: ~39GB RAM
- 27B model: ~80GB RAM
- 70B model: ~200GB RAM
Option 1: Skip Merging
When prompted, choose to cancel the save/upload operation:- Test the model using the built-in chat feature instead
- This doesn’t require merging
Option 2: Ensure Sufficient RAM
Before attempting to merge:- Close all unnecessary applications
- Check available RAM with
free -h(Linux) or Task Manager (Windows) - If using a cloud instance, upgrade to one with sufficient RAM
Option 3: Use Non-Quantized Loading
If you have sufficient VRAM, load the model without quantization:Heretic will show you an estimated RAM requirement before merging (main.py:87-92). Pay attention to this estimate.
Slow Performance or System Freeze
Slow Performance or System Freeze
Symptoms:If you see CPU processing is functional but will be significantly slower (potentially hours to days for larger models).
- Processing takes much longer than expected
- System becomes unresponsive
- No progress for extended periods
Verify GPU is Being Used
Check that Heretic detected your GPU:No GPU or other accelerator detected, ensure:- CUDA drivers are installed correctly
- PyTorch was installed with CUDA support
- Your GPU is visible to PyTorch:
python -c "import torch; print(torch.cuda.is_available())"
CPU Processing Warning
If no GPU is detected (main.py:211-213):Batch Size Determination
During startup, Heretic benchmarks your system to find the optimal batch size (main.py:332-376). This is normal and should take 1-2 minutes. You’ll see:System Freeze During Merge
If your system freezes when merging:- You’ve likely run out of RAM (see “Merge Warnings” above)
- Force restart your system
- Next time, either skip merging or ensure sufficient RAM
Evaluation or Optimization Issues
Evaluation or Optimization Issues
Symptoms:
- All trials show similar results
- KL divergence is unexpectedly high
- Refusal counts don’t decrease
Check Prompt Datasets
Heretic uses default datasets for “good” and “bad” prompts. If results are unexpected, you can specify custom datasets:Verify Model is Actually Censored
Some models have minimal or no censorship to begin with. Check the baseline refusal rate:- If the original model already has 0-5 refusals, there’s little to improve
- Try a different model known to have safety alignment
Trial Count
By default, Heretic runs a limited number of trials. For better optimization:Study Checkpoints
Heretic saves progress to a checkpoint file. If you want to start fresh:Hugging Face Upload Failures
Hugging Face Upload Failures
Symptoms:Your token needs write permissions to create repositories.
HTTPError: 401 UnauthorizedRepository not found- Upload stalls or times out
Authentication
Ensure you have a valid Hugging Face token:Repository Names
Valid repository names:- Must contain only alphanumeric characters, hyphens, and underscores
- Format:
username/model-name - Example:
p-e-w/gemma-3-12b-it-heretic
Network Issues
For large models:- Ensure stable internet connection
- Upload may take 30+ minutes for large models
- Consider using a wired connection instead of WiFi
Disk Space
Verify you have sufficient disk space:- The merged model temporarily uses local disk
- Need at least 2x the model size in free space
Configuration Errors
Configuration Errors
Symptoms:
Configuration contains N errorsValidationError- Parameter warnings
View Help
See all available options:Check Configuration File
If using a config file, verify syntax:Common Parameter Issues
From main.py:162-172, validation errors show:- Which parameter is invalid
- What the error is
- Incorrect data types (string instead of number)
- Invalid enum values (e.g., wrong quantization method)
- Missing required parameters
Getting More Help
If you encounter an issue not covered here:GitHub Issues
Report bugs or request features on the official repository
Discord Community
Get help from the community in real-time
