-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
cc2c2d2
commit 7bd59b7
Showing
15 changed files
with
446 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
Double exponential vdW functional form | ||
====================================== | ||
|
||
This repository contains the scripts, inputs and the results generated as part of the *...* publication. | ||
|
||
**Warning:** This repository, its structure and contents, are currently in a state of flux and incompleteness while the | ||
study is ongoing. We *do not* guarantee the scientific correctness of anything found within, nor do we yet recommend | ||
using any force field parameters found here. | ||
|
||
### Structure | ||
|
||
This repository is structured into four main directories: | ||
|
||
* `data-set-curation` - contains the script used to curate the training and test data sets. | ||
|
||
* `inputs-and-results` - contains the *most up to date* input files required to reproduce this study. **See the Reproduction** section of this README for more information. The project structure was for the most part generated automatically using the [`nonbonded`](https://github.com/SimonBoothroyd/nonbonded) package. | ||
|
||
* `schema` - contains schemas which define most parts of the project, including definitions of | ||
which optimizations and benchmarks were performed. | ||
|
||
* `scripts` - contains the script used to generate the input schemas / files, and scripts which perform ancillary data | ||
analysis. | ||
|
||
### Experimental Data Sets | ||
|
||
The experimental data sets used in this project were curated from the [NIST ThermoML](https://trc.nist.gov/ThermoML.html) | ||
archive. The citations for the individual measurements can be found in `DATA_CITATIONS.bib` | ||
|
||
### Reproduction | ||
|
||
The exact inputs used and outputs reported (including the conda environment used to generate them) in the publication | ||
have been included as tagged releases to this repository. | ||
|
||
For those looking to reproduce the study, the required dependencies may be obtained directly using conda: | ||
|
||
```bash | ||
conda env create --name double-exp-vdw --file environment.yaml | ||
``` | ||
|
||
#### Optimizations | ||
|
||
In most cases the optimizations can be re-run using the following commands | ||
|
||
```bash | ||
cd inputs-and-results/optimizations/XXX/ | ||
nonbonded optimization run | ||
nonbonded optimization analyze | ||
``` | ||
|
||
while the QM optimizations may be re-run, e.g., according to | ||
|
||
```bash | ||
cd inputs-and-results/optimizations/vdw-v1-td-opt-vib-v1/ | ||
ForceBalance optimize.in | ||
``` | ||
|
||
A more complete set of instructions for performing QM fits with ForceBalance can be found [here](https://github.com/openforcefield/openforcefield-forcebalance) | ||
|
||
#### Benchmarks | ||
|
||
A comprehensive set of instructions for re-running the benchmarks can be found in the `inputs-and-results/benchmarks` | ||
directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Physical Property Data Sets | ||
|
||
Within these directories are the scripts used to curate the experimental physical property | ||
data sets that the vdW force field parameters were trained and tested against. | ||
|
||
In particular: | ||
|
||
* ... | ||
|
||
All data sets were curated using the utilities provided by the `openff-evaluator` package and stored | ||
for easy access in both `nonbonded` data set objects and pandas csv files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Physical Property Data Sets | ||
|
||
Within these directories are the scripts used to curate the experimental physical property | ||
data sets that the vdW force field parameters were trained (`optimizations/`) and tested (`benchmarks/`) | ||
against. | ||
|
||
All data sets were curated using the utilities provided by the `openff-evaluator` package and stored | ||
for easy access in both `nonbonded` data set objects and pandas csv files. | ||
|
||
Instructions for curating the sets can be found in each of the sub-directories. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Physical Property Benchmark Data Sets | ||
|
||
Within this directories are the scripts used to curate the experimental solvation / transfer free energy physical | ||
property data sets that the vdW force field parameters were tested against. | ||
|
||
The data is sourced from both the FreeSolv and the Minnesota Solvation (MNSol) Databases. Due to how the MNSol is | ||
licensed we do not include it here, however it can easily be obtained from the [main website](https://conservancy.umn.edu/handle/11299/213300). | ||
|
||
## Curating the data sets | ||
|
||
After obtaining a copy of the MNSol set: | ||
|
||
1) Run the `mnsol-to-csv.py` script to convert the MNSol data from an XLS file to a more easily | ||
manipulable CSV file. | ||
|
||
2) Select a diverse and representative subset of the full MNSol by running the `curate-mnsol-test-set.py`. | ||
This will create both a `sage-mnsol-test-v1.json` and a `sage-mnsol-test-v1.csv` file in the `data-sets` | ||
directory which contain the selected test set data in two convenient formats. For completeness, we have | ||
included a `sage-mnsol-test-v1.txt` file which contains exactly the substances used in our test set but | ||
which excludes the actual values due to licensing concerns. | ||
|
||
3) Select a subset of FreeSolv whereby each selected solute also appears as a solute in the selected MNSol | ||
set. This allows non-aques to aqueous transfer free energies to be computed. This will create both a | ||
`sage-fsolv-test-v1.json` and a `sage-fsolv-test-v1.csv` file in the `data-sets` directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Physical Property Optimization Data Sets | ||
|
||
Within this directories are the scripts used to curate the experimental density and enthalpy of mixing physical | ||
property data sets that the vdW force field parameters were trained against. | ||
|
||
The data is sourced from the [ThermoML archive](https://www.nist.gov/mml/acmd/trc/thermoml) using the | ||
`openff-evaluator` packages data curation tools. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Quantum Chemical Data Sets | ||
|
||
This directory contains the scripts used to curate the different quantum chemical *training* sets from | ||
data stored in the [QCArchive](https://qcarchive.molssi.org/), as well as the final curated data sets. | ||
|
||
**Note**: As the QCArchive is a constantly evolving and updating collection it is possible that running the | ||
scripts provided here will yield slightly different curated collections depending on if new results | ||
have become available. For full reproducibility we have in the `data-sets` directory provided JSON | ||
records which encode the **exact** ids of QC record used during training. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
name: double-exp-vdw | ||
|
||
channels: | ||
- simonboothroyd | ||
- openeye | ||
- conda-forge | ||
- defaults | ||
|
||
dependencies: | ||
|
||
- python | ||
- pip | ||
|
||
- openeye-toolkits | ||
|
||
# Common dependencies | ||
- tqdm | ||
- seaborn | ||
|
||
# vdW fitting | ||
- openff-evaluator >=0.3.6 | ||
- nonbonded >= 0.0.1b1 | ||
- distributed ==2.30.1 | ||
- mdtraj ==1.9.4 | ||
- smirnoff-plugins >=0.0.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Inputs and Results | ||
|
||
This directory contains the main input files required to reproduce the performed optimizations and benchmarks | ||
which were carried out while training the Sage line of OpenFF force fields. | ||
|
||
The individual `optimizations` and `benchmarks` sub-directories contain detailed instructions on how the | ||
input scripts and files should be used. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Benchmarks | ||
|
||
Contained within this directory are the main inputs (or instructions on how they may be generated) | ||
for performing benchmarks of the produced force fields. | ||
|
||
## Manifest | ||
|
||
* `transfer-free-energy` - a directory containing the inputs required to perform solvation / transfer free energy | ||
calculations of a diverse set of solutes / solvents using the `openff-evaluator` package. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Optimizations | ||
|
||
Contained within this directory are the main inputs (or instructions on how they may be generated) | ||
for training the matrix of force fields against both physical property and QC data. | ||
|
||
Currently, the inputs for the vdW fits against physical property data sets are provided, while the | ||
inputs for the valence fits will need to be generated manually due to their size. | ||
|
||
## Running the vdW fits | ||
|
||
Each of the physical property fits can be re-run using the | ||
|
||
```shell | ||
nonbonded optimization run --restart true | ||
``` | ||
|
||
command. The ``--restart true`` flag will ensure that the optimization is correctly restarted if you | ||
are trying to continue from a previous attempt which failed for whatever reason. | ||
|
||
The outputs of the fit can be analyzed and plotted using: | ||
|
||
```shell | ||
nonbonded optimization analyze | ||
nonbonded optimization plot | ||
``` | ||
|
||
which will produce a set of analyzed outputs in a new `analysis` directory, and a set of corresponding | ||
plots, including a trace of the objective function and RMSE metrics from before and after the fit, will | ||
be generated in a new `plots` directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
"""Convert openff-2.0.0 to a double exponential style force field with a custom tip4p-de | ||
water model that has been pre-fit.""" | ||
|
||
from openff.toolkit.typing.engines.smirnoff import ForceField, ParameterList | ||
from simtk import unit | ||
from smirnoff_plugins.handlers.nonbonded import DoubleExponential | ||
|
||
|
||
def add_tip4p_double_exp(force_field: ForceField): | ||
|
||
double_exp_handler = force_field.get_parameter_handler("DoubleExponential") | ||
|
||
# now add in pre-fit tip4p-de water model parameters | ||
de_o = double_exp_handler.DEType( | ||
"[#1]-[#8X2H2+0:1]-[#1]", | ||
epsilon=0.21104 * unit.kilocalorie_per_mole, | ||
r_min=3.5204 * unit.angstrom, | ||
id="tip4p-de-O", | ||
) | ||
de_h = double_exp_handler.DEType( | ||
"[#1:1]-[#8X2H2+0]-[#1]", | ||
epsilon=0 * unit.kilocalorie_per_mole, | ||
r_min=1 * unit.angstrom, | ||
id="tip4p-de-H", | ||
) | ||
|
||
double_exp_handler._parameters.append(de_o) | ||
double_exp_handler._parameters.append(de_h) | ||
|
||
# add a library charge to zero the water and then use the virtual site to adjust the | ||
# charges | ||
lb_charges = force_field.get_parameter_handler("LibraryCharges") | ||
lb_charges._parameters = ParameterList( | ||
[ | ||
lb_charge | ||
for lb_charge in force_field.get_parameter_handler("LibraryCharges") | ||
if lb_charge.id is None or "tip3p" not in lb_charge.id | ||
] | ||
) | ||
lb_charges.add_parameter( | ||
{ | ||
"smirks": "[#1:1]-[#8X2H2+0:2]-[#1:3]", | ||
"id": "tip4p-de", | ||
"charge1": 0 * unit.elementary_charge, | ||
"charge2": 0 * unit.elementary_charge, | ||
"charge3": 0 * unit.elementary_charge, | ||
} | ||
) | ||
|
||
virtual_site_handler = force_field.get_parameter_handler("VirtualSites") | ||
virtual_site_handler.add_parameter( | ||
{ | ||
"smirks": "[#1:1]-[#8X2H2+0:2]-[#1:3]", | ||
"type": "DivalentLonePair", | ||
"distance": -0.010743 * unit.nanometers, | ||
"outOfPlaneAngle": 0.0 * unit.degrees, | ||
"match": "once", | ||
"charge_increment1": 0.53254 * unit.elementary_charge, | ||
"charge_increment2": 0.0 * unit.elementary_charge, | ||
"charge_increment3": 0.53254 * unit.elementary_charge, | ||
} | ||
) | ||
# Currently required due to OpenFF issue #884 | ||
virtual_site_handler._parameters = ParameterList(virtual_site_handler._parameters) | ||
|
||
# Setup the needed constraints | ||
constraints = force_field.get_parameter_handler("Constraints") | ||
|
||
for parameter in constraints.parameters: | ||
parameter.id = parameter.id.replace("tip3p", "tip4p") | ||
|
||
|
||
def build_double_exp_force_field( | ||
initial_force_field_path: str = "openff-2.0.0.offxml", water_model: str = "tip4p-de" | ||
) -> ForceField: | ||
|
||
water_models = ["tip3p", "tip4p-de"] | ||
assert water_model in water_models, f"the allowed water models are {water_models}" | ||
|
||
ff = ForceField(initial_force_field_path, load_plugins=True) | ||
|
||
# create the new double exponential handler and configure with the alpha and beta | ||
# from the water fits | ||
double_exp_handler = DoubleExponential(version="0.3") | ||
double_exp_handler.scale14 = 0.5 | ||
double_exp_handler.alpha = 16.789 | ||
double_exp_handler.beta = 4.592 | ||
|
||
# loop over the vdw and transfer the parameters and zero out the old LJ 12-6 terms | ||
vdw = ff.get_parameter_handler("vdW") | ||
|
||
for lj_parameter in vdw.parameters: | ||
|
||
if "tip3p" in lj_parameter.id: # Skip TIP3P so we can use the custom TIP4P-DE | ||
continue | ||
|
||
de_parameter = double_exp_handler.DEType( | ||
lj_parameter.smirks, | ||
r_min=(lj_parameter.sigma ** 6 * 2) ** (1 / 6), | ||
epsilon=lj_parameter.epsilon, | ||
) | ||
double_exp_handler._parameters.append(de_parameter) | ||
|
||
ff.deregister_parameter_handler("vdW") | ||
|
||
# we have to modify this parameter manually as this can cause the long range | ||
# correction in openmm to fail. This parameter is a place holder with almost zero | ||
# values, with epsilon mostly transferred to the oxygen here we make the r_min | ||
# bigger to work but we could probably zero out this term completely? | ||
problem_parameter = double_exp_handler.parameters["[#1:1]-[#8]"] | ||
problem_parameter.r_min = 1.0 * unit.angstrom | ||
|
||
ff.register_parameter_handler(double_exp_handler) | ||
|
||
if water_model == "tip4p-de": | ||
add_tip4p_double_exp(ff) | ||
elif water_model == "tip3p": | ||
pass | ||
else: | ||
raise NotImplementedError() | ||
|
||
return ff | ||
|
||
|
||
if __name__ == "__main__": | ||
ff = build_double_exp_force_field() | ||
ff.to_file("double-exp-ff.offxml") |
Oops, something went wrong.