Installation
ChemLactica requires Python 3.11 and conda for managing dependencies. Follow these steps to set up your environment.Prerequisites
Before installing ChemLactica, ensure you have:- Python 3.11
- conda (Anaconda or Miniconda)
- CUDA 11.8 (for GPU acceleration)
While ChemLactica can run on CPU, GPU acceleration is strongly recommended for molecular optimization tasks.
Quick installation
Create conda environment
Create a new conda environment with Python 3.11 using the provided environment file:This will install all required dependencies including:
- PyTorch 2.1.2 with CUDA 11.8 support
- Transformers 4.39.0
- RDKit 2023.9.5 for molecular manipulation
- Flash Attention 2.5.6 for efficient inference
- TRL 0.8.6 for fine-tuning
- And many other dependencies
Key dependencies
The installation includes these important packages:Core ML frameworks
- PyTorch 2.1.2: Deep learning framework with CUDA 11.8 support
- Transformers 4.39.0: Hugging Face library for loading pre-trained models
- Accelerate 0.28.0: Distributed training and inference
- TRL 0.8.6: Transformer Reinforcement Learning for fine-tuning
Chemistry libraries
- RDKit 2023.9.5: Comprehensive cheminformatics toolkit
- ChemLactica 0.0.1: Core ChemLactica package (installed via pip)
Performance optimization
- Flash Attention 2.5.6: Efficient attention mechanism implementation
- BitsAndBytes 0.43.0: 8-bit optimizer for memory-efficient training
- Einops 0.7.0: Tensor operations library
Data and utilities
- Datasets 2.18.0: Hugging Face datasets library
- Pandas 2.2.1: Data manipulation
- NumPy 1.26.4: Numerical computing
- Scikit-learn 1.4.1: Machine learning utilities
Verify installation
After installation, verify that everything is set up correctly:Check CUDA availability
To verify GPU support:Download pre-trained models
ChemLactica models are hosted on Hugging Face and will be automatically downloaded when you first use them. You can also download them manually:Troubleshooting
CUDA version mismatch
If you encounter CUDA version issues, ensure your NVIDIA drivers are compatible with CUDA 11.8:Memory issues
For the larger models (1.3B and 2B parameters), ensure you have sufficient GPU memory:- Chemlactica-125M: ~1GB GPU memory
- Chemlactica-1.3B: ~5GB GPU memory
- Chemma-2B: ~8GB GPU memory