generate() function provides a simple interface for generating molecules from text prompts using ChemLactica models.
Function Signature
generation/generation.py:9
Parameters
Input prompt(s) for molecule generation. Can be a single string or list of strings.Each prompt should follow the ChemLactica format with property tags and
[START_SMILES] token.The loaded ChemLactica model (e.g., from
AutoModelForCausalLM.from_pretrained()).Generation parameters passed to
model.generate(). Common parameters:max_new_tokens(int): Maximum tokens to generatetemperature(float): Sampling temperature (0.0-1.0)do_sample(bool): Whether to use sampling vs greedy decodingnum_return_sequences(int): Number of sequences to generate per promptrepetition_penalty(float): Penalty for repeating tokenstop_p(float): Nucleus sampling thresholdtop_k(int): Top-k sampling threshold
Returns
Dictionary mapping each prompt to a list of generated completions.Each completion includes the full generated text from the model.
Basic Usage
Batch Generation
Generation Parameters
Temperature
Multiple Sequences
Extracting SMILES
The generated text includes the full completion. Extract SMILES strings:Notes
The
generate() function is a simple wrapper around Hugging Face’s model.generate(). For production use in molecular optimization, consider using the optimize() function which includes additional features like pool management and oracle scoring.Command-Line Usage
The generation module also includes a command-line interface:See Also
- optimize() - Full molecular optimization with oracle scoring
- Prompting Guide - How to format prompts
- Sampling Strategies - Generation parameter tuning