Skip to content

pnnl/downstream_mol_gnn

Repository files navigation

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators (2022 NeurIPS Workshop on Machine Learning and the Physical Sciences)

The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task. While such approaches have become the norm in fields such as natural language processing, implementation and evaluation of transfer learning approaches for chemistry are in the early stages. In this work, we demonstrate finetuning for downstream tasks on a graph neural network (GNN) trained over a molecular database containing 2.7 million water clusters. The use of Graphcore IPUs as an AI accelerator for training molecular GNNs reduces training time from a reported 2.7 days on 0.5M clusters to 1.2 hours on 2.7M clusters. Finetuning the pretrained model for downstream tasks of molecular dynamics and transfer to a different potential energy surface took only 8.3 hours and 28 minutes, respectively, on a single GPU.

Conda Environment

Using pip with conda

Not all packages are available with conda. To correctly direct a pip install in a conda environment, first conda install pip. Pip will install in your anaconda (or conda or miniconda) directory under the name of your environment (something like /anaconda/envs/env_name/). In all subsequent pip installs, replace pip with /anaconda/envs/env_name/bin/pip.

Pytorch 1.9.0 with cuda 11.1

This installation was used for training across NVIDIA P100s and RTX 2080 Ti GPUs.

conda install pytorch==1.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install pyg -c pyg
conda install -c conda-forge tensorboard ase fair-research-login h5py tqdm gdown

Note that it may be necessary to downgrade setuptools if tensorboard throws an error:

pip install setuptools==59.5.0

Pytorch 1.12.0 with cuda 11.3

This installation was used for training across NVIDIA A100s.

conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install torch-scatter torch-sparse torch-cluster torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu113.html
conda install -c conda-forge tensorboard ase fair-research-login h5py tqdm gdown

Note that installing torch-spine-conv will likely produce a GLIBC error. It is safe to pip uninstall torch-spine-conv if the error occurs.

Data, Models, and Results

Preprocessed datasets, including the database of nonminima computed with the TTM2.1-F potential and the database of minima computed with the MB-pol potential, split files, trained models, ASE databases for MD simulations, and results of the downstream tasks can be downloaded at https://data.pnnl.gov/group/nodes/dataset/33283.

Downstream Tasks

Data Space Expansion

Finetuning the pretrained model on a dataset of nonminima computed with the TTM2.1-F potential and including a force term in the loss function produces a neural network potential (NNP) able to drive molecular dynamics simulations.

python train.py --savedir ./results/data_space_transfer_finetune --args data_space_transfer_args.json 

Molecular dynamics simulations can be performed using md_run.py.

PES Transfer

Finetuning the pretrained model on a small dataset of minima computed with the MB-pol potential allows the network to provide energy predictions comparable to MB-pol.

python train.py --savedir ./results/PES_transfer_finetune --args PES_transfer_args.json 

Energy predictions can be obtained using static_audit.py.

Visualizing Results

Visualizations of the downstream tasks are demonstrated in plot_results.ipynb.

Citation

Jenna A. Bilbrey, Kristina M. Herman, Henry Sprueill, Soritis S. Xantheas, Payel Das, Manuel Lopez Roldan, Mike Kraus, Hatem Helal and Sutanay Choudhury, "Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators," (2022) NeurIPS Workshop on Machine Learning and the Physical Sciences arXiv

About

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors