Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Gromacs Input Files #289

Open
fjclark opened this issue Mar 29, 2022 · 41 comments
Open

Issues with Gromacs Input Files #289

fjclark opened this issue Mar 29, 2022 · 41 comments
Labels

Comments

@fjclark
Copy link
Contributor

fjclark commented Mar 29, 2022

Hello,

I'm running BSS version 2022.1.0, 8.g3f6b1c18 on linux, installed using conda. I'm attempting to run a short simulation with the gromacs input files supplied here. All relevant inputs/ notebooks are here.

I've run into a few issues:

  1. Failure of getSystem(block=True). During equilibration steps (see bdr4_equilibration.ipynb) this command sometimes fails the first time it is run. However, it runs correctly the second time (without repeating the simulation). The system was also unexpectedly unstable (LINCS warnings) and exploded during "PMEMD NPT equilibration for 2ns without restraints" (Fatal error: 3 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x. This usually means that your system is not well equilibrated.) . I had similar issues when using sander. The input was the ligand 1 complex for brd4.

  2. Single point energy calculation works with Gromacs but fails with Amber (see gh_issue.ipynb). Error reads: "EXTRA POINTS: nnb too small!". Input was the ligand 1 complex for jnk1.

  3. BSS fails to read Amber inputs generated with ParmEd (see bottom of gh_issue.ipynb). I converted the jnk1 input (ligand 1 complex) from gro/top to parm7/rst7 according to the discussion here. However, upon attempting to reload the parm7/rst files, I get [OSError: Failed to read molecules from ['jnk1_system.parm7', 'jnk1_system.rst7']. It looks like you failed to include a topology file.]()

Thanks very much.

@fjclark fjclark added the bug label Mar 29, 2022
@lohedges
Copy link
Member

Hi there,

For part of question 1): I'm not sure why this happens but I've recently fixed this (last) week by simply re-trying the trjconv command if it happens to fail the first time. (Weirdly the failures only ever happen within getSystem() when calling a subprocesses, i.e. using the exact same commands on the command-line works every time!).

I should be able to check the other issues at some point tomorrow.

Cheers.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 29, 2022

Hi,

Thanks for the quick response. Great - I'll update to the latest version.

Cheers

@lohedges
Copy link
Member

Out of interest, is there are reason why you need to use ParmEd to convert the GROMACS files, rather than using BioSImSpace directly, i.e.:

import BioSImSpace as BSS

# Load GROMACS system.
system = BSS.IO.readMolecules(["jnk1_complex.gro", "jnk1_complex.top"])

# Write to AMBER format.
BSS.IO.saveMolecules("test", system, ["prm7", "rst7"])

(This is what would be done behind the scenes if you loaded GROMACS files and tried to run a simulation with AMBER.)

This works, but the resulting topology file looks quite different to the one generated by ParmEd. A single-point calculation would test whether the information is self-consistent, despite the differences.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 29, 2022

The original issue I was having was instability during equilibration using sander through BSS and @jmichel80 mentioned that there had been issues previously with file conversion. To rule this out, I wanted to repeat the file conversion using ParmEd and re-run the equilibration to see if I got the same issues. Unfortunately I've not been able to run a single-point calculation using the BSS-converted files (see first part of see gh_issue.ipynb), or to load in the files produced by ParmEd to run a single-point calculation on those.

@lohedges
Copy link
Member

lohedges commented Mar 30, 2022

Hi again,

I'm really not sure what's going on with those GROMACS files provided in the repository you linked to. Out of interest, I looked at their example notebook which shows how the files are generated. If I reproduce their code, but instead write to AMBER format files rather than GROMACS, then everything works, i.e. I can load back the files with BioSimSpace and I get the same results when performing single-point energy calculations with AMBER and GROMACS.

For reference, they are just parameterising the protein using FF14SB and "openff_unconstrained-2.0.0" for the ligand. It isn't possible to do this directly in BioSimSpace since the protein PDB contains missing atom types and the SDF file is required in order to provide the required stereochemistry information for the ligand.

Here's my script. (You'll just need to clone the abfe-benchmark repo to your working directory.)

import BioSimSpace as BSS

import parmed

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField as OFF_ForceField

try:
    from openmm import XmlSerializer, app, unit
    from openmm.app import HBonds, NoCutoff, PDBFile, ForceField
except ImportError:
    from simtk import unit
    from simtk.openmm import XmlSerializer, app
    from simtk.openmm.app import HBonds, NoCutoff, PDBFile

# Load in the ligand.
ligand_molecule = Molecule("abfe-benchmark/structures/jnk1/ligand1.sdf")

# Specify the "Sage" forcefield.
force_field = OFF_ForceField("openff_unconstrained-2.0.0.offxml")

# Parametrize the ligand molecule by creating a Topology object from it.
ligand_system = force_field.create_openmm_system(ligand_molecule.to_topology())

# Read in the coordinates of the ligand from the PDB file.
ligand_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/ligand1.pdb")

# Convert the ligand system to a ParmEd object.
ligand_parmed_structure = parmed.openmm.load_topology(ligand_pdbfile.topology,
											          ligand_system,
											          ligand_pdbfile.positions)

# Write the ligand structure to AMBER format.
ligand_parmed_structure.save("ligand1.rst7", overwrite=True)
ligand_parmed_structure.save("ligand1.parm7", overwrite=True)

# Parse the protein PDB file.
protein_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/protein.pdb")

# Load the AMBER protein force field through OpenMM.
omm_forcefield = app.ForceField("amber14/protein.ff14SB.xml", "amber14/tip3p.xml")

# Parameterize the protein.
protein_system = omm_forcefield.createSystem(protein_pdbfile.topology,
                                             nonbondedCutoff=1*unit.nanometer,
                                             nonbondedMethod=app.NoCutoff,
                                             constraints=None,
                                             rigidWater=False)

# Convert the protein System into a ParmEd Structure.
protein_parmed_structure = parmed.openmm.load_topology(
                                        protein_pdbfile.topology,
                                        protein_system,
                                        xyz=protein_pdbfile.positions)

# Write the protein structure to AMBER format.
protein_parmed_structure.save("protein.rst7", overwrite=True)
protein_parmed_structure.save("protein.parm7", overwrite=True)

# Load the ligand and protein with BioSimSpace.
ligand = BSS.IO.readMolecules("ligand1.*7")[0]
protein = BSS.IO.readMolecules("protein.*7")[0]

# Combine to form a complex.
complx = (ligand + protein).toSystem()

# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create processes for AMBER and GROMACS simulations.
process_amb = BSS.Process.Amber(complx, protocol)
process_gmx = BSS.Process.Gromacs(complx, protocol)

# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()

# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")

This outputs:

Bond energies... AMBER = 202.0904 kcal/mol, GROMACS = 202.0662 kcal/mol
Angle energies... AMBER = 1111.9015 kcal/mol, GROMACS = 1111.8045 kcal/mol
Dihedral energies... AMBER = 4336.2379 kcal/mol, GROMACS = 4315.6788 kcal/mol

I'm not yet sure if the issue with the GROMACS files is our ability to read them, or whether they were invalid in the first place. I'll try to reproduce the above writing to GROMACS format, i.e. using the output of the script, rather what was uploaded to the benchmark repository.

Cheers.

@lohedges
Copy link
Member

lohedges commented Mar 30, 2022

I've just re-run the above script, instead using ParmEd to write to GROMACS format. As you've found, on conversion to AMBER format within BioSimSpace, any AMBER simulation fails with the EXTRA POINTS: nnb too small!" message. While the GROMACS simulations work, the single point energies are different (but close) to those that I obtained above using the the AMBER files written by ParmEd. The output was as follows:

Bond energies... AMBER = None, GROMACS = 241.7304 kcal/mol
Angle energies... AMBER = None, GROMACS = 1124.9594 kcal/mol
Dihedral energies... AMBER = None, GROMACS = 4337.7390 kcal/mol

Given the inconsistencies, I'm not sure whether the issue is with ParmEd or us.

@jmichel80
Copy link
Contributor

That's very helpful @lohedges ! @fjclark could you check whether the AMBER prm files produced by @lohedges 's script work with SOMD ? We can feedback to Bayer next week the above issues.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 30, 2022

Thanks very much @lohedges. @jmichel80 will do.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 31, 2022

@jmichel80 I can confirm that the AMBER files produced by @lohedges's script work with SOMD. However, test alchemical simulations crash when I use the same config file I'm using for my current work, giving "NaN or Inf has been generated along the simulation" during minimisation. This turned out to be due to the use of "constraint = allbonds". Do you have any suggestions as to why this is happening, given that the system appeared well equilibrated (I have used an equilibration scheme almost identical to the one which produced input which works with this config file).

All relevant input is here. I have:

-Produced AMBER format files for brd4, lig1 complex using @lohedges 's script (this failed at first due to this issue - despite having compatible versions of ParmEd and OpenMM in my environment, an incompatible version of ParmEd was bundled with AmberTools, which worked after manually modifying decorators.py).

-Equilibrated using pmemd and pmemd.cuda - see input.

-Run a short production run using SOMD through BSS - successful

-Run single alchemical windows using my standard config file with (failure) and without (success) constraint set to "allbonds".

Thanks

@jmichel80
Copy link
Contributor

This is odd, constraints are disabled before energy minimisation. See https://github.com/michellab/Sire/blob/devel/wrapper/Tools/OpenMMMD.py#L1507-L1520

Also your single point energy before minimisation is identical before minimisation between the two runs, so I expect the same input was passed to integrator.minimiseEnergy()

Is the error reproducible ?

You could try disabling minimisation, that step may not be necessary for an ABFE run started from a structure equilibrated at discharge = 0.0

@fjclark
Copy link
Contributor Author

fjclark commented Mar 31, 2022

I've repeated the calculations, only adding or removing "set_constraints = allbonds" to the config. Adding this consistently causes the simulations to crash during minimisation. I'm running my modified version of SOMD with Boresch restraints, but I've repeated this with the latest version of sire as contained within BSS-dev - I observe the same thing.

When disabling minimisation, I get:

  2 NaN or Inf has been generated along the simulation
  3 Starting /home/finlayclark/sire_boresch_2022.app/bin/somd-freenrg: number of threads equals 20
  4 srun: error: ishikawa: task 0: Exited with exit code 255

Very similar to with minimisation, only no output related to setting up the system is displayed.

@lohedges
Copy link
Member

Out of interest, do the original GROMACS files work with SOMD? (For example, if you just perform minimisation and equilibration with GROMACS rather than PMEMD.) It would be interesting to know if you experience the same constraint related crashes with those, too.

@jmichel80
Copy link
Contributor

Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace) to verify if SOMD runs without errors in that case ? If so comparing the prm7 files should indicate a discrepancy in the handling of bonded terms.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 31, 2022

@lohedges I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.

@lohedges
Copy link
Member

Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace.

I tried this myself, but the original PDB files provided in the repository contained unknown atom types, so parametrisation with ff14sb failed. (Perhaps it would work with PDBs extracted from the AMBER files, but that would rely on the AMBER files being correct in the first place, which we're currently unsure about.) I also couldn't parameterise the ligand within BioSimSpace since the SDF was required for stereochemistry. However, the parametrisation of the ligand should be identical in either case. (Same sequence of commands used.) The protein might differ though, since they are using OpenMM to parameterise, rather than AmberTools.

I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.

Thanks for confirming.

Cheers.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 31, 2022

@jmichel80, it looks like constraints aren't disabled before energy minimisation in runFreeNrg(), only run() https://github.com/michellab/Sire/blob/6ff6b04d56a3b76fb18f7ba2317557d15cd41cb6/wrapper/Tools/OpenMMMD.py#L1686-L1695 .

However, even when I modify the my version of Sire to disable constraints before minimisation in runFreeNrg(), I still get the same failure during minimisation:

CONSTRAINT OFF 
Starting /home/finlayclark/sire_no_constraint_minim.app/bin/somd-freenrg: number of threads equals 20
Traceback (most recent call last):
  File "/home/finlayclark/sire_no_constraint_minim.app/share/Sire/scripts/somd-freenrg.py", line 146, in <module>
    OpenMMMD.runFreeNrg(params)
  File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/__init__.py", line 176, in inner
    retval = func()
  File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/OpenMMMD.py", line 1696, in runFreeNrg
    system = integrator.minimiseEnergy(system, minimise_tol.val, minimise_max_iter.val)
RuntimeError: Particle coordinate is nan 
srun: error: ishikawa: task 0: Exited with exit code 1

@jmichel80
Copy link
Contributor

To follow up on this. The code to disable constraints in run() or runFreeNrg() is buggy. Changing constraint parameters via e.g. integrator.setConstraintType("none") does not cause the openMM system to be reinitialised because Integrator_OpenMM.initialise() is not subsequently called so no effect.

It is likely that the crash during energy minimisation is due to the an incompatibility between the way bonded parameters are defined in the parameter file "openff_unconstrained-2.0.0.offxml" here

and code to setup constraints/bonded terms for the solute in SOMD

https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1802-L1814
vs
https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1868-L1905

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

Hello @lohedges ,

As Bayer does not have AMBER, I've been asked to check the consistency between revised input files:
brd4_inputs.tar.gz

The AMBER inputs produce stable MD with sander (10 ps) and pmemd.cuda (1 ns) for both the complexes. The GROMACS input for the ligand 1 complex also seems to produce stable MD (currently at 0.5 ns).

However, when I attempt to load any of the GROMACS inputs into BSS (2022.1.0+5.g1a606e31, but I've also tried with 2022.2.1), I get the following error:

---------------------------------------------------------------------------
UserWarning                               Traceback (most recent call last)
File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:377, in readMolecules(files, property_map)
    376 try:
--> 377     system = _SireIO.MoleculeParser.read(files, property_map)
    378 except Exception as e:

UserWarning: Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const
__Backtrace__
(  0) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e1ae42] ++0x42)
  -- SireError::getBackTrace()

(  1) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e18dbe] ++0xae)
  -- SireError::exception::exception(QString, QString)

/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x199c8e) [0x7fd2603e1c8e]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x145104) [0x7fd26038d104]
(  4) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd2603f89cc] ++0x3c)
  -- SireMol::AtomNum::map(SireMol::MolInfo const&) const

(  5) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260559851] ++0x31)
  -- SireMol::MoleculeInfoData::atomIdx(SireMol::AtomID const&) const

(  6) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260552706] ++0x26)
  -- SireMol::MoleculeInfo::atomIdx(SireMol::AtomID const&) const

(  7) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd6ce7] ++0x247)
  -- SireIO::GroTop::getBondProperties(SireMol::MoleculeInfo const&, SireIO::GroMolType const&) const

/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x2205cb) [0x7fd1fabd75cb]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x21f262) [0x7fd1fabd6262]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x16092) [0x7fd29fb1c092]
( 11) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd8156] ++0xb66)
  -- SireIO::GroTop::createMolecule(QString, QStringList&, SireBase::PropertyMap const&) const
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x221373) [0x7fd1fabd8373]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x22178d) [0x7fd1fabd878d]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x23302) [0x7fd29fb29302]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x1e794) [0x7fd29fb24794]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fd2cc2d6609]
( 17) /lib/x86_64-linux-gnu/libc.so.6 ([0x7fd2cc095163] ++0x43)
  -- clone

__EndTrace__
Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb Cell 17 in <module>
----> [1](vscode-notebook-cell://ssh-remote%2Bishikawa.chem.ed.ac.uk/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) a = BSS.IO.readMolecules(["ligand1/complex.top", f"ligand1/complex.gro"])

File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:397, in readMolecules(files, property_map)
    395 msg = "Failed to read molecules from: %s" % files
    396 if _isVerbose():
--> 397     raise IOError(msg) from e
    398 else:
    399     raise IOError(msg) from None

OSError: Failed to read molecules from: ['ligand1/complex.top', 'ligand1/complex.gro']

The missing atom number changes if I run the command repeatedly.

When I check the single-point energies for the ligand 1 complex for AMBER, I get:

Bond energies... AMBER = 1357.79 kJ/mol
Angle energies... AMBER = 1763.71 kJ/mol
Dihedral energies... AMBER = 6153.36 kJ/mol

For GROMACS, (bypassing BSS), I get the same angle and dihedral energies, but a very different bond energy:

Bond                                        354.189   kJ/mol
Angle                                       1763.71   kJ/mol
Dih. (proper+ improper)          6153.37   kJ/mol

Do you know what might be causing the above error and discrepancies?

Thanks very much.

@lohedges
Copy link
Member

Thanks for this, I should be able to take a look later. The atom number might change because the parsing is done in parallel, so the order might not be reproducible. (Although things are sorted afterwards.)

@lohedges
Copy link
Member

I think the issue is that the atom names in the *.gro files don't correspond to those from state A in the topology, so the Sire parser isn't finding an appropriate match. For example, here's a merge of ethane and methanol written to file by BSS. (State A = ethane, state B = methanol.)

gro

BioSimSpace System
    8
    1LIG      C    1   1.349   1.431   1.352
    1LIG      C    2   1.502   1.428   1.361
    1LIG      H    3   1.311   1.369   1.434
    1LIG      H    4   1.304   1.386   1.263
    1LIG      H    5   1.307   1.530   1.367
    1LIG      H    6   1.553   1.335   1.387
    1LIG      H    7   1.521   1.495   1.445
    1LIG      H    8   1.549   1.469   1.271
   0.00000   0.00000   0.00000

top

...
[ atoms ]
;   nr   type0  resnr residue  atom   cgnr    charge0        mass0   type1    charge1        mass1
     1      c3      1     LIG     C      1  -0.094350    12.010000      oh  -0.598800    16.000000
     2      c3      1     LIG     C      2  -0.094350    12.010000      c3   0.116700    12.010000
     3      hc      1     LIG     H      3   0.031450     1.008000      ho   0.396000     1.008000
     4      hc      1     LIG     H      4   0.031450     1.008000   hc_du   0.000000     1.008000
     5      hc      1     LIG     H      5   0.031450     1.008000   hc_du   0.000000     1.008000
     6      hc      1     LIG     H      6   0.031450     1.008000      h1   0.028700     1.008000
     7      hc      1     LIG     H      7   0.031450     1.008000      h1   0.028700     1.008000
     8      hc      1     LIG     H      8   0.031450     1.008000      h1   0.028700     1.008000
...

Now look at any of the examples from the archive you uploaded, e.g. ligand1/ligand.{gro/top}:

gro

GROningen MAchine for Chemical Simulation
 3530
    1Ligan  C1x    1   2.990   1.896   0.172
    1Ligan  C2x    2   2.520   1.512   0.139
    1Ligan  C3x    3   2.568   1.493  -0.169
    1Ligan  C4x    4   2.923   1.312  -0.546
    1Ligan  C5x    5   3.065   1.112  -0.469
    1Ligan  C6x    6   2.854   1.174  -0.351
    1Ligan  O1x    7   3.211   1.386  -0.471
    1Ligan Cl1x    8   2.482   1.848  -0.677
...

top

...
[ atoms ]
;   nr       type  resnr residue  atom   cgnr    charge       mass  typeB    chargeB      massB
; residue    1 Ligand 2 rtp Ligand 2 q 0.0
    1         C1      1 Ligand 2    C1x      1 -0.12445362  12.010780   ; qtot -0.124454
    2         C1      1 Ligand 2    C2x      2 -0.02585362  12.010780   ; qtot -0.150307
    3         C1      1 Ligand 2    C3x      3 -0.04085362  12.010780   ; qtot -0.191161
    4         C1      1 Ligand 2    C4x      4 -0.11282062  12.010780   ; qtot -0.303981
    5         C1      1 Ligand 2    C5x      5 -0.11282062  12.010780   ; qtot -0.416802
    6         C1      1 Ligand 2    C6x      6 -0.11282062  12.010780   ; qtot -0.529623
    7         O1      1 Ligand 2    O1x      7 -0.53605362  15.999430   ; qtot -1.065676
    8        Cl1      1 Ligand 2   Cl1x      8 -0.08845362  35.453200   ; qtot -1.154130
...

None of the dummy types, i.e. *x are present in the atomtypes section either, e.g.:

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.33795318     0.45538912
O1             8  15.999430  0.00000000  A     0.30398122     0.87950233
Cl1           17  35.453200  0.00000000  A     0.33075278      1.1112708
C2             6  12.010780  0.00000000  A     0.34806469     0.36350306
N1             7  14.006720  0.00000000  A      0.3206876      0.7016213
O2             8  15.999430  0.00000000  A     0.30251065     0.70485815
S1            16  32.065500  0.00000000  A     0.35635949          1.046
H1             1   1.007947  0.00000000  A     0.26445434    0.066021356
H2             1   1.007947  0.00000000  A     0.25725815     0.06531786
H3             1   1.007947  0.00000000  A     0.25832257    0.068656285
O3             8  15.999430  0.00000000  A     0.31507524       0.635968
H4             1   1.007947  0.00000000  A              1              0
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
Cl2           17  35.453200  0.00000000  A     0.44010397         0.4184

I'll see if I can modify the files for consistency in order to get them to load.

(For reference, I'm not sure if GROMACS cares about anything in the gro file, since it likely just matches things up by number. For our parsers, we have a few self-consistency checks when combining files to create a system, since the coordinates need not come from a GROMACS file at all, i.e. we need some way to be able to map entries in one file to the information in the topology, and line-by-line doesn't always work.)

@lohedges
Copy link
Member

I just modified the name of one of the atoms in the ethane-methanol gro file and it does give an error, although different and more informative to what you are seeing:

BioSimSpace System
    8
    1LIG      C1   1   1.349   1.431   1.352
    1LIG      C    2   1.502   1.428   1.361
    1LIG      H    3   1.311   1.369   1.434
    1LIG      H    4   1.304   1.386   1.263
    1LIG      H    5   1.307   1.530   1.367
    1LIG      H    6   1.553   1.335   1.387
    1LIG      H    7   1.521   1.495   1.445
    1LIG      H    8   1.549   1.469   1.271
   0.00000   0.00000   0.00000
UserWarning: SireError::incompatible_error: Incompatibility between the files, could not match all atoms to the corresponding topology file. Check atom and residue names! (call sire.error.get_last_error_details() for more info)

It looks like this issue would trip you up, but isn't the source of the exact error that you are seeing.

@lohedges
Copy link
Member

It may be failing because of the weird formatting of the state B information, e.g.:

 1         C1      1 Ligand 2    C1x      1 -0.12445362  12.010780   ; qtot -0.124454

I imagine that our parser isn't just ignoring stuff beyond the semi-colon, rather expecting valid information related to state B. Having fixed the names in the gro file, I'll try reading it as a single state file to see if that works.

@lohedges
Copy link
Member

Oh, and just to note that we currently don't have read support for GROMACS FEP files, we can only write to them. The information would only ever be read from the information corresponding to state A.

@lohedges
Copy link
Member

Okay, I've fixed it. The GROMACS files are formatted in a way which can't be parsed by Sire due to the inconsistent naming between gro and top and the use of whitespace in molecule type and residue names. For example the atomtypes line has a Ligand 2 entry for the residue, which is being broken into two records. In the corresponding gro file, the residue is named Ligan due to the space constraints.

In order to get things to read I needed to:

  1. Update all of the atom names in the gro file to match the atom names in state A from the atomtypes section for Ligand 2.
  2. Replace all instances of Ligand 2 in the top file with Ligan to avoid splitting on whitespace and to match the residue name in the gro file.

Here is an archive with the (hopefully) fixed GROMACS files for ligand1. This will allow you to load state A.

I'm not sure how these input files were originally generated, but the files look like they are pushing the limits of the GROMACS formatting and naming conventions. I didn't think that GROMACS used fixed width, so am surprised that things like Ligand 2 aren't causing issues elsewhere.

Cheers.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

Brilliant, thank you very much for sorting this out! I'll send Bayer a link to this issue.

@lohedges
Copy link
Member

No problem. If you could check that files loaded after modification are sane, that would be great.

@jmichel80
Copy link
Contributor

Could we also capture in this issue how the input files were generated by Bayer ?

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

The files seem to be sane. Comparing the single-point energies of ligand1 from your fixed input and the AMBER input gives:

Bond energies... AMBER = 12.1355 kcal/mol, GROMACS = 12.9934 kcal/mol
Angle energies... AMBER = 107.9002 kcal/mol, GROMACS = 107.9861 kcal/mol
Dihedral energies... AMBER = 19.3379 kcal/mol, GROMACS = 19.3600 kcal/mol

Although when I convert the fixed GROMACS inputs to AMBER, process_amber.getBondEnergy() gives None for some reason:

lig_gmx = BSS.IO.readMolecules([f"fixed_input/fixed_ligand1/ligand.gro", f"fixed_input/fixed_ligand1/ligand.top"])

# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create processes for AMBER and GROMACS simulations.
process_gmx = BSS.Process.Gromacs(lig_gmx, protocol)
process_amb = BSS.Process.Amber(lig_gmx, protocol)

# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()

# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")

Outputs

Bond energies... AMBER = None, GROMACS = 12.9934 kcal/mol
Angle energies... AMBER = 107.9843 kcal/mol, GROMACS = 107.9861 kcal/mol
Dihedral energies... AMBER = 19.3600 kcal/mol, GROMACS = 19.3600 kcal/mol

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

@jmichel80 I'm afraid I don't yet have any more information on how the inputs were generated beyond what was in Joseph's email - that this is an altered workflow which is yet to be pushed to github.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

@lohedges , I was also wondering about comparing bond energies. In your comment above (#289 (comment)) you use these when comparing single-point energies. However, when I convert one of these systems (ligand 1 complex) loaded from the same (AMBER) input to GROMACS, I find that the bond energy changes by over 100 kcal / mol (although the angle and dihedral energies are unaffected).

Bond energies... AMBER = 324.5188 kcal/mol, GROMACS = 84.6532 kcal/mol
Angle energies... AMBER = 421.5360 kcal/mol, GROMACS = 421.5368 kcal/mol
Dihedral energies... AMBER = 1470.6892 kcal/mol, GROMACS = 1464.7347 kcal/mol

Is this unexpected or does this just depend on the details of the interconversion?

Thanks.

@lohedges
Copy link
Member

Bond energies only compare in vacuum, or for comparable water models. AMBER and GROMACS handle rigid waters differently, one with explicit H-H bonds, the other with constraints.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 19, 2022

OK, thank you.

@lohedges
Copy link
Member

I'm not sure why the AMBER process is returning None for the bond energy, will try to debug that tomorrow. Other than water molecules, the number of bonded terms are the same, i.e.:

import BioSimSpace as BSS

s_gro = BSS.IO.readMolecules("fixed_ligand1/*)

# Check bonds in the ligand.
s_gro[0]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 59 )

# Check bonds in the first water molecule. (No explicit H-H bond for GROMACS, rigid water handled by SETTLE.)
s_gro[1]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 0 )

# Convert to AMBER format.
BSS.IO.saveMolecules("tmp", s_gro, ["prm7", "rst7"])

# Reload.
s_amb = BSS.IO.readMolecules("tmp.*")

# Check bonds in the ligand. (Same as before.)
s_amb[0]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 59 )

# Check bonds in the first water molecule. (AMBER rigid water has three bonds.)
s_amb[1]._sire_object.property("bond")
TwoAtomFunctions( nFunctions() == 3 

@lohedges
Copy link
Member

Just to say that I don't think the file is valid. If I look at the output files (both GROMACS and AMBER) it looks like the energy has blown up. Some terms appear sensible (and similar), e.g. angle and dihedral, but the total energy is huge or NaN. For AMBER, you are seeing None for the bond energy because the energy record is just asterisks, i.e.:

anber.nrg_

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1              NaN            NaN     2.8472E+05     O        2736

 BOND    = *************  ANGLE   =      107.9843  DIHED      =       19.3600
 VDWAALS =           NaN  EEL     =           NaN  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000

For GROMACS, I needed to do some monkeying to get things to run due to an invalid decomposition on my laptop. The log file gives:

gromacs.log

   Energies (kJ/mol)
           Bond          Angle    Proper Dih.        LJ (SR)  Disper. corr.
    5.43645e+01    4.51814e+02    8.10024e+01    3.55627e+17   -2.30155e+02
   Coulomb (SR)   Coul. recip.      Potential Pres. DC (bar) Pressure (bar)
   -2.82337e+04    5.48558e+03    3.55627e+17   -1.01028e+02   -3.15806e+18

Note the huge potential energy. I wonder one/some of the atoms should actually be dummies, i.e. not part of the energy calculation.

@lohedges
Copy link
Member

I'd forgotten to update the names for hydrogens in the water molecules, so I think it was matching against the wrong atom type. Will do that now to see if it fixes things.

@lohedges
Copy link
Member

Bah, doing that ends up back with the original error. I think that atom names were correct in the original file, I had been misreading names as types. However, I think it's the ligand name that's causing issues, so will try with unchanged atom names, and updated molecule name, i.e. Ligand.

@lohedges
Copy link
Member

Nope, still the original error. Sorry, I think the files that I sent were wrong, but somehow magically loaded. I'll add some debugging statements to the Sire.IO.GroTop parser to work out where the crash occurs. That should (hopefully) tell us which record is causing the issue. The problem occurs with the topology file alone (if you do things just in Sire), so I assume that it's a formatting issue there (or something that Sire can't parser.)

Apologies for going round in circles. The joys of debugging at the end of the day!

@fjclark
Copy link
Contributor Author

fjclark commented Oct 20, 2022

Not at all - thanks very much for all the work on this!

@lohedges
Copy link
Member

Bah, it was a Sire file caching issue. Changing the ligand names in the topology from Ligand 2 and Ligand 8 to Ligan fixes things, i.e. it loads. Will check the energies after lunch 🤦‍♂️

@lohedges
Copy link
Member

Yes, that fixes things. (Thank goodness.)

process_gmx.getAngleEnergy().kj_per_mol()
451.8140 kJ/mol

process_amb.getAngleEnergy().kj_per_mol()
451.8063 kJ/mol

process_gmx.getDihedralEnergy().kj_per_mol()
81.0024 kJ/mol

process_amb.getDihedralEnergy().kj_per_mol()
81.0022 kJ/mol

And the log files:

GROMACS:

   Energies (kJ/mol)
           Bond          Angle    Proper Dih.        LJ (SR)  Disper. corr.
    5.43645e+01    4.51814e+02    8.10024e+01    3.84003e+04   -2.30155e+02
   Coulomb (SR)   Coul. recip.      Potential Pres. DC (bar) Pressure (bar)
   -4.76565e+04    1.50650e+03   -7.62286e+03   -1.01028e+02    7.40276e+04

AMBER:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1      -1.6796E+03     1.5635E+02     3.5811E+03     O        3294

 BOND    =       46.1225  ANGLE   =      107.9843  DIHED      =       19.3600
 VDWAALS =     9169.2413  EEL     =   -11022.3213  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000

So, the answer is to not have multi-word residue names in the topology file, and to match the residue name between the gro and top files. Should be something that is easy to enforce at their end, or to patch afterwards.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 20, 2022

Brilliant, thanks very much for sorting this. Will pass that on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants