-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Gromacs Input Files #289
Comments
Hi there, For part of question 1): I'm not sure why this happens but I've recently fixed this (last) week by simply re-trying the I should be able to check the other issues at some point tomorrow. Cheers. |
Hi, Thanks for the quick response. Great - I'll update to the latest version. Cheers |
Out of interest, is there are reason why you need to use ParmEd to convert the GROMACS files, rather than using BioSImSpace directly, i.e.: import BioSImSpace as BSS
# Load GROMACS system.
system = BSS.IO.readMolecules(["jnk1_complex.gro", "jnk1_complex.top"])
# Write to AMBER format.
BSS.IO.saveMolecules("test", system, ["prm7", "rst7"]) (This is what would be done behind the scenes if you loaded GROMACS files and tried to run a simulation with AMBER.) This works, but the resulting topology file looks quite different to the one generated by ParmEd. A single-point calculation would test whether the information is self-consistent, despite the differences. |
The original issue I was having was instability during equilibration using sander through BSS and @jmichel80 mentioned that there had been issues previously with file conversion. To rule this out, I wanted to repeat the file conversion using ParmEd and re-run the equilibration to see if I got the same issues. Unfortunately I've not been able to run a single-point calculation using the BSS-converted files (see first part of see gh_issue.ipynb), or to load in the files produced by ParmEd to run a single-point calculation on those. |
Hi again, I'm really not sure what's going on with those GROMACS files provided in the repository you linked to. Out of interest, I looked at their example notebook which shows how the files are generated. If I reproduce their code, but instead write to AMBER format files rather than GROMACS, then everything works, i.e. I can load back the files with BioSimSpace and I get the same results when performing single-point energy calculations with AMBER and GROMACS. For reference, they are just parameterising the protein using FF14SB and "openff_unconstrained-2.0.0" for the ligand. It isn't possible to do this directly in BioSimSpace since the protein PDB contains missing atom types and the SDF file is required in order to provide the required stereochemistry information for the ligand. Here's my script. (You'll just need to clone the abfe-benchmark repo to your working directory.) import BioSimSpace as BSS
import parmed
from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField as OFF_ForceField
try:
from openmm import XmlSerializer, app, unit
from openmm.app import HBonds, NoCutoff, PDBFile, ForceField
except ImportError:
from simtk import unit
from simtk.openmm import XmlSerializer, app
from simtk.openmm.app import HBonds, NoCutoff, PDBFile
# Load in the ligand.
ligand_molecule = Molecule("abfe-benchmark/structures/jnk1/ligand1.sdf")
# Specify the "Sage" forcefield.
force_field = OFF_ForceField("openff_unconstrained-2.0.0.offxml")
# Parametrize the ligand molecule by creating a Topology object from it.
ligand_system = force_field.create_openmm_system(ligand_molecule.to_topology())
# Read in the coordinates of the ligand from the PDB file.
ligand_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/ligand1.pdb")
# Convert the ligand system to a ParmEd object.
ligand_parmed_structure = parmed.openmm.load_topology(ligand_pdbfile.topology,
ligand_system,
ligand_pdbfile.positions)
# Write the ligand structure to AMBER format.
ligand_parmed_structure.save("ligand1.rst7", overwrite=True)
ligand_parmed_structure.save("ligand1.parm7", overwrite=True)
# Parse the protein PDB file.
protein_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/protein.pdb")
# Load the AMBER protein force field through OpenMM.
omm_forcefield = app.ForceField("amber14/protein.ff14SB.xml", "amber14/tip3p.xml")
# Parameterize the protein.
protein_system = omm_forcefield.createSystem(protein_pdbfile.topology,
nonbondedCutoff=1*unit.nanometer,
nonbondedMethod=app.NoCutoff,
constraints=None,
rigidWater=False)
# Convert the protein System into a ParmEd Structure.
protein_parmed_structure = parmed.openmm.load_topology(
protein_pdbfile.topology,
protein_system,
xyz=protein_pdbfile.positions)
# Write the protein structure to AMBER format.
protein_parmed_structure.save("protein.rst7", overwrite=True)
protein_parmed_structure.save("protein.parm7", overwrite=True)
# Load the ligand and protein with BioSimSpace.
ligand = BSS.IO.readMolecules("ligand1.*7")[0]
protein = BSS.IO.readMolecules("protein.*7")[0]
# Combine to form a complex.
complx = (ligand + protein).toSystem()
# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)
# Create processes for AMBER and GROMACS simulations.
process_amb = BSS.Process.Amber(complx, protocol)
process_gmx = BSS.Process.Gromacs(complx, protocol)
# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)
# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()
# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}") This outputs:
I'm not yet sure if the issue with the GROMACS files is our ability to read them, or whether they were invalid in the first place. I'll try to reproduce the above writing to GROMACS format, i.e. using the output of the script, rather what was uploaded to the benchmark repository. Cheers. |
I've just re-run the above script, instead using ParmEd to write to GROMACS format. As you've found, on conversion to AMBER format within BioSimSpace, any AMBER simulation fails with the
Given the inconsistencies, I'm not sure whether the issue is with ParmEd or us. |
Thanks very much @lohedges. @jmichel80 will do. |
@jmichel80 I can confirm that the AMBER files produced by @lohedges's script work with SOMD. However, test alchemical simulations crash when I use the same config file I'm using for my current work, giving "NaN or Inf has been generated along the simulation" during minimisation. This turned out to be due to the use of "constraint = allbonds". Do you have any suggestions as to why this is happening, given that the system appeared well equilibrated (I have used an equilibration scheme almost identical to the one which produced input which works with this config file). All relevant input is here. I have: -Produced AMBER format files for brd4, lig1 complex using @lohedges 's script (this failed at first due to this issue - despite having compatible versions of ParmEd and OpenMM in my environment, an incompatible version of ParmEd was bundled with AmberTools, which worked after manually modifying decorators.py). -Equilibrated using pmemd and pmemd.cuda - see input. -Run a short production run using SOMD through BSS - successful -Run single alchemical windows using my standard config file with (failure) and without (success) constraint set to "allbonds". Thanks |
This is odd, constraints are disabled before energy minimisation. See https://github.com/michellab/Sire/blob/devel/wrapper/Tools/OpenMMMD.py#L1507-L1520 Also your single point energy before minimisation is identical before minimisation between the two runs, so I expect the same input was passed to integrator.minimiseEnergy() Is the error reproducible ? You could try disabling minimisation, that step may not be necessary for an ABFE run started from a structure equilibrated at discharge = 0.0 |
I've repeated the calculations, only adding or removing "set_constraints = allbonds" to the config. Adding this consistently causes the simulations to crash during minimisation. I'm running my modified version of SOMD with Boresch restraints, but I've repeated this with the latest version of sire as contained within BSS-dev - I observe the same thing. When disabling minimisation, I get:
Very similar to with minimisation, only no output related to setting up the system is displayed. |
Out of interest, do the original GROMACS files work with SOMD? (For example, if you just perform minimisation and equilibration with GROMACS rather than PMEMD.) It would be interesting to know if you experience the same constraint related crashes with those, too. |
Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace) to verify if SOMD runs without errors in that case ? If so comparing the prm7 files should indicate a discrepancy in the handling of bonded terms. |
@lohedges I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files. |
I tried this myself, but the original PDB files provided in the repository contained unknown atom types, so parametrisation with ff14sb failed. (Perhaps it would work with PDBs extracted from the AMBER files, but that would rely on the AMBER files being correct in the first place, which we're currently unsure about.) I also couldn't parameterise the ligand within BioSimSpace since the SDF was required for stereochemistry. However, the parametrisation of the ligand should be identical in either case. (Same sequence of commands used.) The protein might differ though, since they are using OpenMM to parameterise, rather than AmberTools.
Thanks for confirming. Cheers. |
@jmichel80, it looks like constraints aren't disabled before energy minimisation in runFreeNrg(), only run() https://github.com/michellab/Sire/blob/6ff6b04d56a3b76fb18f7ba2317557d15cd41cb6/wrapper/Tools/OpenMMMD.py#L1686-L1695 . However, even when I modify the my version of Sire to disable constraints before minimisation in runFreeNrg(), I still get the same failure during minimisation:
|
To follow up on this. The code to disable constraints in run() or runFreeNrg() is buggy. Changing constraint parameters via e.g. integrator.setConstraintType("none") does not cause the openMM system to be reinitialised because Integrator_OpenMM.initialise() is not subsequently called so no effect. It is likely that the crash during energy minimisation is due to the an incompatibility between the way bonded parameters are defined in the parameter file "openff_unconstrained-2.0.0.offxml" here and code to setup constraints/bonded terms for the solute in SOMD https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1802-L1814 |
Hello @lohedges , As Bayer does not have AMBER, I've been asked to check the consistency between revised input files: The AMBER inputs produce stable MD with sander (10 ps) and pmemd.cuda (1 ns) for both the complexes. The GROMACS input for the ligand 1 complex also seems to produce stable MD (currently at 0.5 ns). However, when I attempt to load any of the GROMACS inputs into BSS (2022.1.0+5.g1a606e31, but I've also tried with 2022.2.1), I get the following error:
The missing atom number changes if I run the command repeatedly. When I check the single-point energies for the ligand 1 complex for AMBER, I get:
For GROMACS, (bypassing BSS), I get the same angle and dihedral energies, but a very different bond energy:
Do you know what might be causing the above error and discrepancies? Thanks very much. |
Thanks for this, I should be able to take a look later. The atom number might change because the parsing is done in parallel, so the order might not be reproducible. (Although things are sorted afterwards.) |
I think the issue is that the atom names in the gro
top
Now look at any of the examples from the archive you uploaded, e.g. gro
top
None of the dummy types, i.e.
I'll see if I can modify the files for consistency in order to get them to load. (For reference, I'm not sure if GROMACS cares about anything in the |
I just modified the name of one of the atoms in the ethane-methanol gro file and it does give an error, although different and more informative to what you are seeing:
It looks like this issue would trip you up, but isn't the source of the exact error that you are seeing. |
It may be failing because of the weird formatting of the state B information, e.g.:
I imagine that our parser isn't just ignoring stuff beyond the semi-colon, rather expecting valid information related to state B. Having fixed the names in the gro file, I'll try reading it as a single state file to see if that works. |
Oh, and just to note that we currently don't have read support for GROMACS FEP files, we can only write to them. The information would only ever be read from the information corresponding to state A. |
Okay, I've fixed it. The GROMACS files are formatted in a way which can't be parsed by Sire due to the inconsistent naming between gro and top and the use of whitespace in molecule type and residue names. For example the In order to get things to read I needed to:
Here is an archive with the (hopefully) fixed GROMACS files for ligand1. This will allow you to load state A. I'm not sure how these input files were originally generated, but the files look like they are pushing the limits of the GROMACS formatting and naming conventions. I didn't think that GROMACS used fixed width, so am surprised that things like Cheers. |
Brilliant, thank you very much for sorting this out! I'll send Bayer a link to this issue. |
No problem. If you could check that files loaded after modification are sane, that would be great. |
Could we also capture in this issue how the input files were generated by Bayer ? |
The files seem to be sane. Comparing the single-point energies of ligand1 from your fixed input and the AMBER input gives: Bond energies... AMBER = 12.1355 kcal/mol, GROMACS = 12.9934 kcal/mol Although when I convert the fixed GROMACS inputs to AMBER, process_amber.getBondEnergy() gives None for some reason:
Outputs
|
@jmichel80 I'm afraid I don't yet have any more information on how the inputs were generated beyond what was in Joseph's email - that this is an altered workflow which is yet to be pushed to github. |
@lohedges , I was also wondering about comparing bond energies. In your comment above (#289 (comment)) you use these when comparing single-point energies. However, when I convert one of these systems (ligand 1 complex) loaded from the same (AMBER) input to GROMACS, I find that the bond energy changes by over 100 kcal / mol (although the angle and dihedral energies are unaffected).
Is this unexpected or does this just depend on the details of the interconversion? Thanks. |
Bond energies only compare in vacuum, or for comparable water models. AMBER and GROMACS handle rigid waters differently, one with explicit H-H bonds, the other with constraints. |
OK, thank you. |
I'm not sure why the AMBER process is returning import BioSimSpace as BSS
s_gro = BSS.IO.readMolecules("fixed_ligand1/*)
# Check bonds in the ligand.
s_gro[0]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 59 )
# Check bonds in the first water molecule. (No explicit H-H bond for GROMACS, rigid water handled by SETTLE.)
s_gro[1]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 0 )
# Convert to AMBER format.
BSS.IO.saveMolecules("tmp", s_gro, ["prm7", "rst7"])
# Reload.
s_amb = BSS.IO.readMolecules("tmp.*")
# Check bonds in the ligand. (Same as before.)
s_amb[0]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 59 )
# Check bonds in the first water molecule. (AMBER rigid water has three bonds.)
s_amb[1]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 3 |
Just to say that I don't think the file is valid. If I look at the output files (both GROMACS and AMBER) it looks like the energy has blown up. Some terms appear sensible (and similar), e.g. angle and dihedral, but the total energy is huge or NaN. For AMBER, you are seeing anber.nrg_
For GROMACS, I needed to do some monkeying to get things to run due to an invalid decomposition on my laptop. The log file gives: gromacs.log
Note the huge potential energy. I wonder one/some of the atoms should actually be dummies, i.e. not part of the energy calculation. |
I'd forgotten to update the names for hydrogens in the water molecules, so I think it was matching against the wrong atom type. Will do that now to see if it fixes things. |
Bah, doing that ends up back with the original error. I think that atom names were correct in the original file, I had been misreading names as types. However, I think it's the ligand name that's causing issues, so will try with unchanged atom names, and updated molecule name, i.e. |
Nope, still the original error. Sorry, I think the files that I sent were wrong, but somehow magically loaded. I'll add some debugging statements to the Apologies for going round in circles. The joys of debugging at the end of the day! |
Not at all - thanks very much for all the work on this! |
Bah, it was a Sire file caching issue. Changing the ligand names in the topology from |
Yes, that fixes things. (Thank goodness.) process_gmx.getAngleEnergy().kj_per_mol()
451.8140 kJ/mol
process_amb.getAngleEnergy().kj_per_mol()
451.8063 kJ/mol
process_gmx.getDihedralEnergy().kj_per_mol()
81.0024 kJ/mol
process_amb.getDihedralEnergy().kj_per_mol()
81.0022 kJ/mol And the log files: GROMACS:
AMBER:
So, the answer is to not have multi-word residue names in the topology file, and to match the residue name between the gro and top files. Should be something that is easy to enforce at their end, or to patch afterwards. |
Brilliant, thanks very much for sorting this. Will pass that on. |
Hello,
I'm running BSS version 2022.1.0, 8.g3f6b1c18 on linux, installed using conda. I'm attempting to run a short simulation with the gromacs input files supplied here. All relevant inputs/ notebooks are here.
I've run into a few issues:
Failure of getSystem(block=True). During equilibration steps (see bdr4_equilibration.ipynb) this command sometimes fails the first time it is run. However, it runs correctly the second time (without repeating the simulation). The system was also unexpectedly unstable (LINCS warnings) and exploded during "PMEMD NPT equilibration for 2ns without restraints" (Fatal error: 3 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x. This usually means that your system is not well equilibrated.) . I had similar issues when using sander. The input was the ligand 1 complex for brd4.
Single point energy calculation works with Gromacs but fails with Amber (see gh_issue.ipynb). Error reads: "EXTRA POINTS: nnb too small!". Input was the ligand 1 complex for jnk1.
BSS fails to read Amber inputs generated with ParmEd (see bottom of gh_issue.ipynb). I converted the jnk1 input (ligand 1 complex) from gro/top to parm7/rst7 according to the discussion here. However, upon attempting to reload the parm7/rst files, I get
[OSError: Failed to read molecules from ['jnk1_system.parm7', 'jnk1_system.rst7']. It looks like you failed to include a topology file.]()
Thanks very much.
The text was updated successfully, but these errors were encountered: