How It Works
The optimization process uses the model’s ability to generate molecules conditioned on:- Similar molecules with specified Tanimoto similarity scores
- Desired properties like SAS, QED, molecular weight, etc.
- Oracle scores indicating the quality of generated molecules
- Generates new molecules from prompts based on high-scoring molecules in the pool
- Evaluates them using a custom oracle function
- Maintains a pool of top-performing molecules
- Fine-tunes the model on successful molecules (optional)
Benchmark Results
ChemLactica achieves state-of-the-art performance across multiple optimization tasks:Practical Molecular Optimization (PMO)
ChemLactica
17.5 average score
Previous SOTA
16.2 (Genetic-guided GFlowNets)
Docking Optimization
AutoDock Vina Optimization
3-4x fewer oracle calls to generate 100 good molecules compared to Beam Enumeration (previous SOTA)
QED Optimization
From the RetMol paper benchmark:ChemLactica-125M
99% success rate with 10K oracle calls
RetMol (Original)
96% success rate with 50K oracle calls
Key Features
Custom Oracles
Define any objective function to optimize molecules for your specific use case
Flexible Prompting
Control molecular properties through structured prompts with similarity constraints
Adaptive Fine-tuning
Optional rejection sampling strategy that fine-tunes the model during optimization
Efficient Search
Genetic-like algorithm with diversity filtering to explore chemical space effectively
Optimization Strategies
ChemLactica supports two optimization strategies:Default Strategy
Uses the pre-trained model without additional fine-tuning:- Faster iteration times
- No need for training infrastructure
- Works well for objectives similar to pre-training data
- May not adapt to highly specialized objectives
Rejection Sampling (rej-sample-v2)
Fine-tunes the model during optimization on high-scoring molecules:- Adapts to specific optimization objectives
- Achieves better results on challenging tasks
- Model learns to generate molecules matching the oracle
- Requires GPU memory for training
- Slower iteration times
- Needs careful hyperparameter tuning
The rej-sample-v2 strategy is recommended for achieving state-of-the-art results on challenging benchmarks. Use default strategy for faster exploration or when GPU memory is limited.
Quick Start
Here’s a minimal example to get started:Next Steps
Design Custom Oracles
Learn how to implement custom oracle functions for your objectives
Algorithm Details
Understand the optimization algorithm workflow
Configure Hyperparameters
Tune the optimization process for your use case
See Examples
Explore complete working examples