Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators (2022 NeurIPS Workshop on Machine Learning and the Physical Sciences)

The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task. While such approaches have become the norm in fields such as natural language processing, implementation and evaluation of transfer learning approaches for chemistry are in the early stages. In this work, we demonstrate finetuning for downstream tasks on a graph neural network (GNN) trained over a molecular database containing 2.7 million water clusters. The use of Graphcore IPUs as an AI accelerator for training molecular GNNs reduces training time from a reported 2.7 days on 0.5M clusters to 1.2 hours on 2.7M clusters. Finetuning the pretrained model for downstream tasks of molecular dynamics and transfer to a different potential energy surface took only 8.3 hours and 28 minutes, respectively, on a single GPU.

Conda Environment

Using pip with conda

Not all packages are available with conda. To correctly direct a pip install in a conda environment, first conda install pip. Pip will install in your anaconda (or conda or miniconda) directory under the name of your environment (something like /anaconda/envs/env_name/). In all subsequent pip installs, replace pip with /anaconda/envs/env_name/bin/pip.

Pytorch 1.9.0 with cuda 11.1

This installation was used for training across NVIDIA P100s and RTX 2080 Ti GPUs.

conda install pytorch==1.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install pyg -c pyg
conda install -c conda-forge tensorboard ase fair-research-login h5py tqdm gdown

Note that it may be necessary to downgrade setuptools if tensorboard throws an error:

pip install setuptools==59.5.0

Pytorch 1.12.0 with cuda 11.3

This installation was used for training across NVIDIA A100s.

conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install torch-scatter torch-sparse torch-cluster torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu113.html
conda install -c conda-forge tensorboard ase fair-research-login h5py tqdm gdown

Note that installing torch-spine-conv will likely produce a GLIBC error. It is safe to pip uninstall torch-spine-conv if the error occurs.

Data, Models, and Results

Preprocessed datasets, including the database of nonminima computed with the TTM2.1-F potential and the database of minima computed with the MB-pol potential, split files, trained models, ASE databases for MD simulations, and results of the downstream tasks can be downloaded at https://data.pnnl.gov/group/nodes/dataset/33283.

Downstream Tasks

Data Space Expansion

Finetuning the pretrained model on a dataset of nonminima computed with the TTM2.1-F potential and including a force term in the loss function produces a neural network potential (NNP) able to drive molecular dynamics simulations.

python train.py --savedir ./results/data_space_transfer_finetune --args data_space_transfer_args.json

Molecular dynamics simulations can be performed using md_run.py.

PES Transfer

Finetuning the pretrained model on a small dataset of minima computed with the MB-pol potential allows the network to provide energy predictions comparable to MB-pol.

python train.py --savedir ./results/PES_transfer_finetune --args PES_transfer_args.json

Energy predictions can be obtained using static_audit.py.

Visualizing Results

Visualizations of the downstream tasks are demonstrated in plot_results.ipynb.

Citation

Jenna A. Bilbrey, Kristina M. Herman, Henry Sprueill, Soritis S. Xantheas, Payel Das, Manuel Lopez Roldan, Mike Kraus, Hatem Helal and Sutanay Choudhury, "Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators," (2022) NeurIPS Workshop on Machine Learning and the Physical Sciences arXiv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators (2022 NeurIPS Workshop on Machine Learning and the Physical Sciences)

Conda Environment

Using pip with conda

Pytorch 1.9.0 with cuda 11.1

Pytorch 1.12.0 with cuda 11.3

Data, Models, and Results

Downstream Tasks

Data Space Expansion

PES Transfer

Visualizing Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
utils		utils
LICENSE		LICENSE
PES_transfer_args.json		PES_transfer_args.json
README.md		README.md
data_space_transfer_args.json		data_space_transfer_args.json
md_run.py		md_run.py
plot_results.ipynb		plot_results.ipynb
static_audit.py		static_audit.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators (2022 NeurIPS Workshop on Machine Learning and the Physical Sciences)

Conda Environment

Using pip with conda

Pytorch 1.9.0 with cuda 11.1

Pytorch 1.12.0 with cuda 11.3

Data, Models, and Results

Downstream Tasks

Data Space Expansion

PES Transfer

Visualizing Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages