Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
SimonBoothroyd committed Oct 7, 2021
1 parent cc2c2d2 commit 7bd59b7
Show file tree
Hide file tree
Showing 15 changed files with 446 additions and 0 deletions.
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Double exponential vdW functional form
======================================

This repository contains the scripts, inputs and the results generated as part of the *...* publication.

**Warning:** This repository, its structure and contents, are currently in a state of flux and incompleteness while the
study is ongoing. We *do not* guarantee the scientific correctness of anything found within, nor do we yet recommend
using any force field parameters found here.

### Structure

This repository is structured into four main directories:

* `data-set-curation` - contains the script used to curate the training and test data sets.

* `inputs-and-results` - contains the *most up to date* input files required to reproduce this study. **See the Reproduction** section of this README for more information. The project structure was for the most part generated automatically using the [`nonbonded`](https://github.com/SimonBoothroyd/nonbonded) package.

* `schema` - contains schemas which define most parts of the project, including definitions of
which optimizations and benchmarks were performed.

* `scripts` - contains the script used to generate the input schemas / files, and scripts which perform ancillary data
analysis.

### Experimental Data Sets

The experimental data sets used in this project were curated from the [NIST ThermoML](https://trc.nist.gov/ThermoML.html)
archive. The citations for the individual measurements can be found in `DATA_CITATIONS.bib`

### Reproduction

The exact inputs used and outputs reported (including the conda environment used to generate them) in the publication
have been included as tagged releases to this repository.

For those looking to reproduce the study, the required dependencies may be obtained directly using conda:

```bash
conda env create --name double-exp-vdw --file environment.yaml
```

#### Optimizations

In most cases the optimizations can be re-run using the following commands

```bash
cd inputs-and-results/optimizations/XXX/
nonbonded optimization run
nonbonded optimization analyze
```

while the QM optimizations may be re-run, e.g., according to

```bash
cd inputs-and-results/optimizations/vdw-v1-td-opt-vib-v1/
ForceBalance optimize.in
```

A more complete set of instructions for performing QM fits with ForceBalance can be found [here](https://github.com/openforcefield/openforcefield-forcebalance)

#### Benchmarks

A comprehensive set of instructions for re-running the benchmarks can be found in the `inputs-and-results/benchmarks`
directory.
11 changes: 11 additions & 0 deletions data-set-curation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Physical Property Data Sets

Within these directories are the scripts used to curate the experimental physical property
data sets that the vdW force field parameters were trained and tested against.

In particular:

* ...

All data sets were curated using the utilities provided by the `openff-evaluator` package and stored
for easy access in both `nonbonded` data set objects and pandas csv files.
10 changes: 10 additions & 0 deletions data-set-curation/physical-property/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Physical Property Data Sets

Within these directories are the scripts used to curate the experimental physical property
data sets that the vdW force field parameters were trained (`optimizations/`) and tested (`benchmarks/`)
against.

All data sets were curated using the utilities provided by the `openff-evaluator` package and stored
for easy access in both `nonbonded` data set objects and pandas csv files.

Instructions for curating the sets can be found in each of the sub-directories.
24 changes: 24 additions & 0 deletions data-set-curation/physical-property/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Physical Property Benchmark Data Sets

Within this directories are the scripts used to curate the experimental solvation / transfer free energy physical
property data sets that the vdW force field parameters were tested against.

The data is sourced from both the FreeSolv and the Minnesota Solvation (MNSol) Databases. Due to how the MNSol is
licensed we do not include it here, however it can easily be obtained from the [main website](https://conservancy.umn.edu/handle/11299/213300).

## Curating the data sets

After obtaining a copy of the MNSol set:

1) Run the `mnsol-to-csv.py` script to convert the MNSol data from an XLS file to a more easily
manipulable CSV file.

2) Select a diverse and representative subset of the full MNSol by running the `curate-mnsol-test-set.py`.
This will create both a `sage-mnsol-test-v1.json` and a `sage-mnsol-test-v1.csv` file in the `data-sets`
directory which contain the selected test set data in two convenient formats. For completeness, we have
included a `sage-mnsol-test-v1.txt` file which contains exactly the substances used in our test set but
which excludes the actual values due to licensing concerns.

3) Select a subset of FreeSolv whereby each selected solute also appears as a solute in the selected MNSol
set. This allows non-aques to aqueous transfer free energies to be computed. This will create both a
`sage-fsolv-test-v1.json` and a `sage-fsolv-test-v1.csv` file in the `data-sets` directory.
7 changes: 7 additions & 0 deletions data-set-curation/physical-property/optimizations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Physical Property Optimization Data Sets

Within this directories are the scripts used to curate the experimental density and enthalpy of mixing physical
property data sets that the vdW force field parameters were trained against.

The data is sourced from the [ThermoML archive](https://www.nist.gov/mml/acmd/trc/thermoml) using the
`openff-evaluator` packages data curation tools.
9 changes: 9 additions & 0 deletions data-set-curation/quantum-chemical/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Quantum Chemical Data Sets

This directory contains the scripts used to curate the different quantum chemical *training* sets from
data stored in the [QCArchive](https://qcarchive.molssi.org/), as well as the final curated data sets.

**Note**: As the QCArchive is a constantly evolving and updating collection it is possible that running the
scripts provided here will yield slightly different curated collections depending on if new results
have become available. For full reproducibility we have in the `data-sets` directory provided JSON
records which encode the **exact** ids of QC record used during training.
Empty file.
25 changes: 25 additions & 0 deletions environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: double-exp-vdw

channels:
- simonboothroyd
- openeye
- conda-forge
- defaults

dependencies:

- python
- pip

- openeye-toolkits

# Common dependencies
- tqdm
- seaborn

# vdW fitting
- openff-evaluator >=0.3.6
- nonbonded >= 0.0.1b1
- distributed ==2.30.1
- mdtraj ==1.9.4
- smirnoff-plugins >=0.0.2
7 changes: 7 additions & 0 deletions inputs-and-results/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Inputs and Results

This directory contains the main input files required to reproduce the performed optimizations and benchmarks
which were carried out while training the Sage line of OpenFF force fields.

The individual `optimizations` and `benchmarks` sub-directories contain detailed instructions on how the
input scripts and files should be used.
9 changes: 9 additions & 0 deletions inputs-and-results/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Benchmarks

Contained within this directory are the main inputs (or instructions on how they may be generated)
for performing benchmarks of the produced force fields.

## Manifest

* `transfer-free-energy` - a directory containing the inputs required to perform solvation / transfer free energy
calculations of a diverse set of solutes / solvents using the `openff-evaluator` package.
Empty file.
29 changes: 29 additions & 0 deletions inputs-and-results/optimizations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Optimizations

Contained within this directory are the main inputs (or instructions on how they may be generated)
for training the matrix of force fields against both physical property and QC data.

Currently, the inputs for the vdW fits against physical property data sets are provided, while the
inputs for the valence fits will need to be generated manually due to their size.

## Running the vdW fits

Each of the physical property fits can be re-run using the

```shell
nonbonded optimization run --restart true
```

command. The ``--restart true`` flag will ensure that the optimization is correctly restarted if you
are trying to continue from a previous attempt which failed for whatever reason.

The outputs of the fit can be analyzed and plotted using:

```shell
nonbonded optimization analyze
nonbonded optimization plot
```

which will produce a set of analyzed outputs in a new `analysis` directory, and a set of corresponding
plots, including a trace of the objective function and RMSE metrics from before and after the fit, will
be generated in a new `plots` directory.
127 changes: 127 additions & 0 deletions scripts/build_initial_ff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
"""Convert openff-2.0.0 to a double exponential style force field with a custom tip4p-de
water model that has been pre-fit."""

from openff.toolkit.typing.engines.smirnoff import ForceField, ParameterList
from simtk import unit
from smirnoff_plugins.handlers.nonbonded import DoubleExponential


def add_tip4p_double_exp(force_field: ForceField):

double_exp_handler = force_field.get_parameter_handler("DoubleExponential")

# now add in pre-fit tip4p-de water model parameters
de_o = double_exp_handler.DEType(
"[#1]-[#8X2H2+0:1]-[#1]",
epsilon=0.21104 * unit.kilocalorie_per_mole,
r_min=3.5204 * unit.angstrom,
id="tip4p-de-O",
)
de_h = double_exp_handler.DEType(
"[#1:1]-[#8X2H2+0]-[#1]",
epsilon=0 * unit.kilocalorie_per_mole,
r_min=1 * unit.angstrom,
id="tip4p-de-H",
)

double_exp_handler._parameters.append(de_o)
double_exp_handler._parameters.append(de_h)

# add a library charge to zero the water and then use the virtual site to adjust the
# charges
lb_charges = force_field.get_parameter_handler("LibraryCharges")
lb_charges._parameters = ParameterList(
[
lb_charge
for lb_charge in force_field.get_parameter_handler("LibraryCharges")
if lb_charge.id is None or "tip3p" not in lb_charge.id
]
)
lb_charges.add_parameter(
{
"smirks": "[#1:1]-[#8X2H2+0:2]-[#1:3]",
"id": "tip4p-de",
"charge1": 0 * unit.elementary_charge,
"charge2": 0 * unit.elementary_charge,
"charge3": 0 * unit.elementary_charge,
}
)

virtual_site_handler = force_field.get_parameter_handler("VirtualSites")
virtual_site_handler.add_parameter(
{
"smirks": "[#1:1]-[#8X2H2+0:2]-[#1:3]",
"type": "DivalentLonePair",
"distance": -0.010743 * unit.nanometers,
"outOfPlaneAngle": 0.0 * unit.degrees,
"match": "once",
"charge_increment1": 0.53254 * unit.elementary_charge,
"charge_increment2": 0.0 * unit.elementary_charge,
"charge_increment3": 0.53254 * unit.elementary_charge,
}
)
# Currently required due to OpenFF issue #884
virtual_site_handler._parameters = ParameterList(virtual_site_handler._parameters)

# Setup the needed constraints
constraints = force_field.get_parameter_handler("Constraints")

for parameter in constraints.parameters:
parameter.id = parameter.id.replace("tip3p", "tip4p")


def build_double_exp_force_field(
initial_force_field_path: str = "openff-2.0.0.offxml", water_model: str = "tip4p-de"
) -> ForceField:

water_models = ["tip3p", "tip4p-de"]
assert water_model in water_models, f"the allowed water models are {water_models}"

ff = ForceField(initial_force_field_path, load_plugins=True)

# create the new double exponential handler and configure with the alpha and beta
# from the water fits
double_exp_handler = DoubleExponential(version="0.3")
double_exp_handler.scale14 = 0.5
double_exp_handler.alpha = 16.789
double_exp_handler.beta = 4.592

# loop over the vdw and transfer the parameters and zero out the old LJ 12-6 terms
vdw = ff.get_parameter_handler("vdW")

for lj_parameter in vdw.parameters:

if "tip3p" in lj_parameter.id: # Skip TIP3P so we can use the custom TIP4P-DE
continue

de_parameter = double_exp_handler.DEType(
lj_parameter.smirks,
r_min=(lj_parameter.sigma ** 6 * 2) ** (1 / 6),
epsilon=lj_parameter.epsilon,
)
double_exp_handler._parameters.append(de_parameter)

ff.deregister_parameter_handler("vdW")

# we have to modify this parameter manually as this can cause the long range
# correction in openmm to fail. This parameter is a place holder with almost zero
# values, with epsilon mostly transferred to the oxygen here we make the r_min
# bigger to work but we could probably zero out this term completely?
problem_parameter = double_exp_handler.parameters["[#1:1]-[#8]"]
problem_parameter.r_min = 1.0 * unit.angstrom

ff.register_parameter_handler(double_exp_handler)

if water_model == "tip4p-de":
add_tip4p_double_exp(ff)
elif water_model == "tip3p":
pass
else:
raise NotImplementedError()

return ff


if __name__ == "__main__":
ff = build_double_exp_force_field()
ff.to_file("double-exp-ff.offxml")
Loading

0 comments on commit 7bd59b7

Please sign in to comment.