Configuration File Structure
Here’s the completechemlactica_125m_hparams.yaml from the repository:
Core Parameters
Model Configuration
Path or HuggingFace model ID for the ChemLactica modelOptions:
yerevann/chemlactica-125m(recommended for most tasks)yerevann/chemlactica-1.3b(better performance, more memory)yerevann/chemma-2b(best performance)- Local path to fine-tuned checkpoint
Path or HuggingFace model ID for the tokenizer (usually same as checkpoint)
Device to run the model onOptions:
cuda:0,cuda:1, etc. (GPU)cpu(not recommended, very slow)
Pool Settings
Number of top molecules to maintain in the poolTrade-offs:
- Smaller (5-10): Faster iterations, more focused search
- Larger (20-50): More diversity, better exploration
Percentage of pool reserved for validation during fine-tuningRecommendation: 0.2 (20%) works well. Decrease if pool_size < 10.
Number of similar molecules to include in each generation promptTrade-offs:
- Fewer (1-3): More exploration, less guidance
- More (5-10): Tighter control, local search
Number of unique molecules to generate per iterationTrade-offs:
- Smaller (50-100): Faster iterations, more fine-tuning rounds
- Larger (200-500): Fewer iterations, more generation per round
Similarity Range
Range of Tanimoto similarities to use in generation promptsThe algorithm samples random similarities in this range when creating prompts.Trade-offs:
- Narrow (0.6-0.8): Local search around known molecules
- Wide (0.3-0.9): Broader exploration
Generation Parameters
Number of molecules to generate in parallel per batchConstraint: Should match or exceed
num_gens_per_iter for efficiency.Temperature range for generation (start, end)Temperature increases linearly from start to end during optimization.Trade-offs:
- Lower (0.8-1.0): More conservative, higher quality
- Higher (1.2-1.5): More diverse, creative solutions
End-of-sequence token for the modelDon’t change unless using a different base model.
generation_config
These parameters are passed directly to HuggingFace’smodel.generate():
Maximum number of tokens to generateRecommendation: 100 is sufficient for most molecules (SMILES rarely exceed this).
Whether to use sampling (vs. greedy decoding)Must be true for molecular optimization to work.
Penalty for repeating tokens (1.0 = no penalty)Recommendation: Keep at 1.0. Higher values can break SMILES syntax.
Token ID for end-of-sequenceDon’t change unless using a different tokenizer.
Optimization Strategy
Optimization strategy to useOptions:
[default]- No fine-tuning, use pre-trained model only[rej-sample-v2]- Adaptive fine-tuning on high-scoring molecules
[rej-sample-v2] for best results.Fine-tuning Configuration (rej_sample_config)
These parameters control the adaptive fine-tuning process:Training Trigger
Number of iterations without improvement before triggering fine-tuningTrade-offs:
- Lower (1-2): Frequent fine-tuning, slower iterations
- Higher (4-6): More generation, less adaptation
Learning Rate
Peak learning rate for fine-tuningRecommendation: 1e-4 works well. Decrease to 5e-5 for larger models.
Final learning rate (polynomial decay)Recommendation: Keep at 0.
Number of warmup steps for learning rate scheduleRecommendation: 10 for small pool sizes.
Batch Size and Gradient Accumulation
Per-device batch size during fine-tuningConstraint: Limited by GPU memory.Recommendation:
- 125M model: 2-4
- 1.3B model: 1-2
- 2B model: 1
Number of steps to accumulate gradientsEffective batch size =
train_batch_size × gradient_accumulation_stepsRecommendation: Adjust to achieve effective batch size of 8-16.Regularization
L2 regularization strengthRecommendation: 0.1 prevents overfitting to small pool.
Maximum gradient norm for clippingRecommendation: 1.0 for stable training.
Training Duration
Number of epochs to fine-tune per roundTrade-offs:
- Fewer (2-3): Faster, less adaptation
- More (5-10): Better adaptation, risk of overfitting
Other Settings
Directory to save fine-tuning checkpoints
Number of workers for data loading
Maximum sequence length for training
Whether to pack multiple examples into one sequenceRecommendation: Keep false for molecular optimization.
Adam optimizer beta1 parameter
Adam optimizer beta2 parameter
Example Configurations
Fast Exploration (Default Strategy)
- Quick experiments
- Simple objectives
- Limited GPU memory
- Objectives similar to pre-training
High-Performance (Rejection Sampling)
- Challenging benchmarks (PMO, docking)
- Novel objectives
- Maximum performance
- When you have GPU resources
Multi-Objective Optimization
- Multiple competing objectives
- Pareto frontier exploration
- Highly constrained search spaces
Loading Configuration
Load your configuration in Python:Hyperparameter Tuning Tips
Start with defaults
Start with defaults
The provided
chemlactica_125m_hparams.yaml works well for most tasks. Start there and adjust only if needed.Monitor intermediate results
Monitor intermediate results
Check oracle logs during optimization:If scores plateau early, try:
- Increasing
generation_temperature[1]for more exploration - Decreasing
train_tol_levelfor earlier fine-tuning - Increasing
pool_sizefor more diversity
GPU memory issues
GPU memory issues
If you run out of memory:
- Use smaller model (125M instead of 1.3B)
- Reduce
train_batch_sizeand increasegradient_accumulation_steps - Reduce
generation_batch_size - Use
torch_dtype=torch.bfloat16for model loading
Slow optimization
Slow optimization
To speed up:
- Use
strategy: [default](no fine-tuning) - Increase
generation_batch_size(if memory allows) - Reduce
num_train_epochsin fine-tuning - Use smaller model (125M)
Poor diversity
Poor diversity
If getting too many similar molecules:
- Increase
generation_temperature[1] - Widen
sim_range(e.g., [0.3, 0.95]) - Increase
pool_size - Decrease
num_similars
Next Steps
See Complete Examples
Explore full working examples with different configurations
Understand the Algorithm
Learn how hyperparameters affect the optimization process