Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken dummy atoms logic in SOMD #408

Open
jmichel80 opened this issue Oct 7, 2022 · 23 comments
Open

Broken dummy atoms logic in SOMD #408

jmichel80 opened this issue Oct 7, 2022 · 23 comments
Assignees
Labels
2023.0 Work related to the 2023 release bug

Comments

@jmichel80
Copy link
Contributor

The move to 2023 seems to have broken SOMD. The example below attempts to carry out an RBFE calculation on a protein-ligand system.

I am using feature_pmefep_2023 ; Sire version: 2023.0.0 (28ff974/GitHub/2022-08-25T07:52:15+01:00)

To reproduce the runtime error extract the attached inputs, with that Sire version in your environement

debug.zip

unzip debug.zip 
cd debug/output/lambda-0.00
bash RUNME

Generates on my test computer


(...)
### Running Single Topology Molecular Dynamics Free Energy (v0.2) on node01 ###
###================Setting up calculation=====================###
New run. Loading input and creating restart
lambda is 0.0
Create the System...
Selecting dummy groups
Traceback (most recent call last):
  File "/export/users/julien/miniforge3/envs/sire-dev/share/Sire/scripts/somd-freenrg.py", line 161, in <module>
    OpenMMMD.runFreeNrg(params)
  File "/export/users/julien/miniforge3/envs/sire-dev/lib/python3.9/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner
    retval = func()
  File "/export/users/julien/miniforge3/envs/sire-dev/lib/python3.9/site-packages/sire/legacy/Tools/OpenMMMD.py", line 2275, in runFreeNrg
    system = createSystemFreeEnergy(molecules)
  File "/export/users/julien/miniforge3/envs/sire-dev/lib/python3.9/site-packages/sire/legacy/Tools/OpenMMMD.py", line 1273, in createSystemFreeEnergy
    solute_ref_todummy = solute_ref_todummy.add(
UserWarning: SireError::incompatible_error: The molecules "", number 0, and "LIG", number 1004, are different, and therefore incompatible. (call sire.error.get_last_error_details() for more info)

Manually editing the pert file in debug/input/system.pert to reset all atom types chante to 'du' to a non dummy type resolves the runtime error.

The runtime error is caused by the following line of code

       for x in range(0, ndummies):
            dummy_index = dummies[x].index()
            solute_ref_hard = solute_ref_hard.subtract(
                solute.select(dummy_index)
            )
            solute_ref_todummy = solute_ref_todummy.add(
                solute.select(dummy_index)
            )

@annamherz - could you confirm the provide inputs run without error on a pre-2023 build ?

@jmichel80 jmichel80 added bug 2023.0 Work related to the 2023 release labels Oct 7, 2022
@jmichel80
Copy link
Contributor Author

@annamherz note that to run the code in another branch you may have to comment out the line

cutoff type = PME

in the sim.cfg file

I have reproduced the error using the standard cutoffperiodic option so I do not believe the issue is caused by PME specific code changes in that feature branch.

@lohedges
Copy link
Member

lohedges commented Oct 7, 2022

Thanks, @jmichel80. I should be able to take a look later today.

@lohedges
Copy link
Member

lohedges commented Oct 7, 2022

Here's a minimal example of the issue using BioSimSpace:

Sire < 2023:

import BioSimSpace as BSS
from Sire.Mol import AtomIdx, Selector_Atom_

# Load an example system and get the first molecule.
mol = BSS.IO.readMolecules("amber/ala/ala*")[0]

# Create an empty selector by selecting all atoms and inverting, as is done by SOMD.
selector0 = m.selectAllAtoms().invert()
print(selector0)
Selector<SireMol::Atom>( [  ] )

# What molecule does this correspond to?
print(selector0.molecule())
Molecule( 2 version 21 : nAtoms() = 22, nResidues() = 3 )

# Create an empty atom selector by constructing the object directly.
selector1 = Selector_Atom()
print(selector1)
Selector<SireMol::Atom>( [  ] )

# What molecule does this correspond to?
print(selector1.molecule())
Molecule( 0 version 0 : nAtoms() = 0, nResidues() = 0 )

# Are these the same?
selector0 == selector1
False

# Add the first molecule to the selector. This works.
selector0.add(mol.getAtoms()[0]._sire_object))
Selector<SireMol::Atom>( [ AtomIdx(0) ] )

# Now try adding to the second selector. This fails with the error you are seeing.
selector1.add(mol.getAtoms()[0]._sire_object))
...
Exception 'SireError::incompatible_error' thrown by the thread 'master'.
The molecules "", number 0, and "", number , are different, and therefore incompatible.

Sire 2023:

import BioSimSpace as BSS
from sire.legacy.Mol import AtomIdx, Selector_Atom_

# Load an example system and get the first molecule.
mol = BSS.IO.readMolecules("amber/ala/ala*")[0]

# Create an empty selector by selecting all atoms and inverting, as is done by SOMD.
selector0 = mol._sire_object.selectAllAtoms().invert()
print(selector0)
Selector<SireMol::Atom>::empty

# What molecule does this correspond to?
print(selector0.molecule())
Molecule( :0      num_atoms=0 num_residues=0 )

# Create an empty atom selector by constructing the object directly.
selector1 = Selector_Atom()
print(selector1)
Selector<SireMol::Atom>::empty

# What molecule does this correspond to?
print(selector1.molecule())
Molecule( :0      num_atoms=0 num_residues=0 )

# Are these the same?
selector0 == selector1
True

# Add the first molecule to the selector. This now fails.
selector0.add(mol.getAtoms()[0]._sire_object))
UserWarning: SireError::incompatible_error: The molecules "", number 0, and "", number 633, are different, and therefore incompatible. (call sire.error.get_last_error_details() for more info)

# Now try adding to the second selector. This still fails.
selector1.add(mol.getAtoms()[0]._sire_object))
...
UserWarning: SireError::incompatible_error: The molecules "", number 0, and "", number 633, are different, and therefore incompatible. (call sire.error.get_last_error_details() for more info)

It looks like an empty selection on a molecule is being converted to a null Selector_Atom_, so is losing reference to the molecule to which the selection was applied.

@jmichel80
Copy link
Contributor Author

Something doesn't add up since this particular code in SOMD has been around for several years and used for many thousands of RBFEs without this error showing up previously, it's very odd to see a failure on Sire < 2023

@lohedges
Copy link
Member

lohedges commented Oct 7, 2022

Sorry, it works on < 2023. Apologies if that wasn't clear in the example, or if I made a typo. In my first example, it works if the selector is made from a molecule, as is done in SOMD. For 2023, it fails regardless of how the selector was created.

@lohedges
Copy link
Member

lohedges commented Oct 10, 2022

Okay, I've made some modifications to SOMD and it now appears to be working. (It starts up and runs without crashing, at least.) Could you check this at your end (perhaps there are other surprises) and if everything is okay then I'll push a fix.

diff --git a/wrapper/Tools/OpenMMMD.py b/wrapper/Tools/OpenMMMD.py
index 74fa1ba3..35c0a0e7 100644
--- a/wrapper/Tools/OpenMMMD.py
+++ b/wrapper/Tools/OpenMMMD.py
@@ -307,10 +307,10 @@ def getSolute(system):
     # matching the perturbed_resnum.val.
 
     # Create the query string.
-    query = f"mol with resnum {perturbed_resnum.val}"
+    query = f"resnum {perturbed_resnum.val}"
 
     # Perform the search.
-    search = system.search(query)
+    search = system.search(query).molecules()
 
     # Make sure there is only one result.
     if len(search) != 1:
@@ -858,6 +858,8 @@ def getDummies(molecule):
 
     from_dummies = None
     to_dummies = None
+    from_non_dummies = []
+    to_non_dummies = []
 
     for x in range(0, natoms):
         atom = atoms[x]
@@ -866,13 +868,17 @@ def getDummies(molecule):
                 from_dummies = molecule.selectAll(atom.index())
             else:
                 from_dummies += molecule.selectAll(atom.index())
-        elif atom.property("final_ambertype") == "du":
+        else:
+            from_non_dummies.append(atom.index())
+        if atom.property("final_ambertype") == "du":
             if to_dummies is None:
                 to_dummies = molecule.selectAll(atom.index())
             else:
                 to_dummies += molecule.selectAll(atom.index())
+        else:
+            to_non_dummies.append(atom.index())
 
-    return to_dummies, from_dummies
+    return to_dummies, from_dummies, to_non_dummies, from_non_dummies
 
 
 def createSystemFreeEnergy(molecules):
@@ -946,10 +952,10 @@ def createSystemFreeEnergy(molecules):
     solute_grp_ref_fromdummy = MoleculeGroup("solute_ref_fromdummy")
 
     solute_ref_hard = solute.selectAllAtoms()
-    solute_ref_todummy = solute_ref_hard.invert()
-    solute_ref_fromdummy = solute_ref_hard.invert()
+    solute_ref_todummy = solute.selectAllAtoms()
+    solute_ref_fromdummy = solute.selectAllAtoms()
 
-    to_dummies, from_dummies = getDummies(solute)
+    to_dummies, from_dummies, to_non_dummies, from_non_dummies = getDummies(solute)
 
     if to_dummies is not None:
         ndummies = to_dummies.count()
@@ -958,7 +964,9 @@ def createSystemFreeEnergy(molecules):
         for x in range(0, ndummies):
             dummy_index = dummies[x].index()
             solute_ref_hard = solute_ref_hard.subtract(solute.select(dummy_index))
-            solute_ref_todummy = solute_ref_todummy.add(solute.select(dummy_index))
+
+    for non_dummy in to_non_dummies:
+        solute_ref_todummy = solute_ref_todummy.subtract(solute.select(non_dummy))
 
     if from_dummies is not None:
         ndummies = from_dummies.count()
@@ -967,7 +975,9 @@ def createSystemFreeEnergy(molecules):
         for x in range(0, ndummies):
             dummy_index = dummies[x].index()
             solute_ref_hard = solute_ref_hard.subtract(solute.select(dummy_index))
-            solute_ref_fromdummy = solute_ref_fromdummy.add(solute.select(dummy_index))
+
+    for non_dummy in from_non_dummies:
+        solute_ref_fromdummy = solute_ref_fromdummy.subtract(solute.select(non_dummy))
 
     solute_grp_ref_hard.add(solute_ref_hard)
     solute_grp_ref_todummy.add(solute_ref_todummy)

The trick was to ensure that we never used an empty selection, which lost reference to the original solute. Instead, we use a full atom selection on the solute and subtract non-dummy atoms in the initial and final state to generate, e.g., solute_ref_todummy. I also needed to update to search string used to extract the perturbable molecule, since this has also changed in 2023. I'll add a single lambda leg FEP test into BioSimSpace so we catch issues like this in future. (SOMD isn't tested as part of Sire itself.)

Cheers.

lohedges added a commit to michellab/BioSimSpace that referenced this issue Oct 10, 2022
@lohedges
Copy link
Member

lohedges commented Oct 10, 2022

I've now added a basic SOMD FEP test to BioSimSpace on my current feature branch here. This fails with the current Sire 2023, but passes with the application of the diff. I'll update the test/s if we find other parts of SOMD that no longer work as expected.

(This does nothing to check that the output is valid, only that it's possible to configure and run a SOMD process without error.)

@lohedges
Copy link
Member

lohedges commented Oct 10, 2022

I also discovered an issue with the generateDistanceRestraintsDict function which was (incorrectly) assuming that the solute would be MolNum(1). I've switched this to using the getSolute function, which will be correct regardless of the MolNum associated with the solute. The generateDistanceRestraintsDict function was also broken if distance restraints were applied to a single solute system, i.e, it would assume that there were multiple solutes and fail. I've now added error handling for this.

@jmichel80
Copy link
Contributor Author

I think this feature was not updated in Hannes's branch as work started on a version where the perturbed solute was. Looping @fjclark for awareness as this may become relevant when we migrate the current ABFE work to Sire >=2023.*

@fjclark
Copy link
Contributor

fjclark commented Oct 10, 2022

Thanks @lohedges and @jmichel80. Sorry, I missed this when I encountered this issue originally (#374) because I never use generateDistanceRestraintsDict (I always supply my own dictionary), so this has not been updated anywhere.

@lohedges
Copy link
Member

No problem. My edits are now available in this fix branch. @jmichel80: Let me know when you're happy and I'll create a PR against devel.

Cheers.

@lohedges
Copy link
Member

lohedges commented Oct 10, 2022

It is also possible to work around the issue by calling .selection() on the solute, then calling .selectNone(), the using .select() to add the atoms of interest. For example:

import BioSimSpace as BSS
from sire.legacy.Mol import AtomIdx, Selector_Atom_

# Load an example system and get the first molecule.
mol = BSS.IO.readMolecules("amber/ala/ala*")[0]

# Create an empty selector by selecting all atoms and inverting, as was done by SOMD.
selector = mol._sire_object.selectAllAtoms().invert()
print(selector)
Selector<SireMol::Atom>::empty

# We've now lost the relationship between molecule and selection.
print(selector.molecule())
Molecule( :0      num_atoms=0 num_residues=0 )

# Add the first molecule to the selector. This now fails.
selector.add(mol.getAtoms()[0]._sire_object))
UserWarning: SireError::incompatible_error: The molecules "", number 0, and "", number 633, are different, and therefore incompatible. (call sire.error.get_last_error_details() for more info)

# Use an AtomSelection instead.
atom_selection = mol._sire_object.selection()

# Zero the selection.
atom_selection.selectNone()

# Select the first atom. (Just repeat for any other atoms we want.)
atom_selection.select(AtomIdx(0))

# Convert to a Selector_Atom, which is what was used previously.
selector_atom = Selector_Atom_(mol._sire_object, atom_selection)
print(selector_atom)
Selector<SireMol::Atom>( size=1
0:  Atom( HH31:1  [  13.68,   13.15,   15.27] )
)

# By going this route, we preserve the relationship between the selection and the molecule.
print(selector_atom.molecule())
Molecule( :2      num_atoms=22 num_residues=3 )

I don't understand the full details of what's going on behind the scenes, but it's confusing that these different selection approaches give different results. It would be good to make things consistent (or at least understandable) before the full 2023 release.

@lohedges
Copy link
Member

I've updated the legacy SireUnitTests to use sire.legacy from the 2023 API here. These all still pass, which is good news 👍

@lohedges
Copy link
Member

Actually, it turns out that the only test of SOMD (OpenMMMD.py) in the suite was removed due to difficulty porting to the new API. The commit comment even implies that the selection behaviour might be problematic.

The challenge is that it uses the search syntax to assemble
selections of atoms in a different way to what was expected.

Reinstating this test does indeed lead to the same failure as in the original post, i.e:

Create the System...
Selecting dummy groups
Traceback (most recent call last):
  File "/home/lester/Code/SireUnitTests/unittests/SireMove/test_combRules.py", line 82, in <module>
    test_combining_rules("arithmetic", True)
  File "/home/lester/Code/SireUnitTests/unittests/SireMove/test_combRules.py", line 43, in test_combining_rules
    system = OpenMMMD.createSystemFreeEnergy(molecules)
  File "/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/sire/legacy/Tools/OpenMMMD.py", line 958, in createSystemFreeEnergy
    solute_ref_todummy = solute_ref_todummy.add(solute.select(dummy_index))
UserWarning: SireError::incompatible_error: The molecules "", number 0, and "LIG", number 2, are different, and therefore incompatible. (call sire.error.get_last_error_details() for more info)

(This is the only failure in the entire test suite.)

I've now reinstated the test (using the approach outlined here) and have updated all other unit tests to run against the sire.legacy API. These can be found on the sire-legacy branch of SireUnitTests.

I'd like to keep these tests for posterity. It would also be good to discuss how we should run them to ensure backwards compatibility going forward. (Do we want to reinstate as part of the CI, or only test against them periodically.) Looking at the size of the input files that are used, it would probably be unwise to move these into the main Sire repo.

Cheers.

@chryswoods chryswoods self-assigned this Oct 24, 2022
@chryswoods
Copy link
Member

Assigning myself as this is tricky, and I want to remind myself that we should have a design discussion about this (especially the difference between a selection returning nothing - an empty selection - and a single AtomSelection from a molecule being set to select no atoms). There are a lot of edge cases that cause issues if empty selections remember the molecule, especially when we search across multiple molecules. It would be good to talk through this in detail, and to come up with a way to get the functionality that is desired using an API that would not have surprising or inconsistent behaviour.

@lohedges
Copy link
Member

Thanks, it would be great to have a design discussion about this. I totally agree that the new approach makes sense, and is what you would need with multi molecule searches.l, etc.

Is the fix that I've implemented correct, i.e. avoiding empty selections, or is there an easier way to do this, or other edge cases that I might have missed? It would be good to push out a fix so that we can begin testing SOMD against the 2023 release.

Cheers.

@chryswoods
Copy link
Member

chryswoods commented Oct 26, 2022

I am looking through the OpenMMMD.py code and have some thoughts.

I think that the getDummies function can be simplified from

def getDummies(molecule):
    print ("Selecting dummy groups")
    natoms = molecule.nAtoms()
    atoms = molecule.atoms()

    from_dummies = None
    to_dummies = None
    from_non_dummies = []
    to_non_dummies = []

    for x in range(0, natoms):
        atom = atoms[x]
        if atom.property("initial_ambertype") == "du":
            if from_dummies is None:
                from_dummies = molecule.selectAll(atom.index())
            else:
                from_dummies += molecule.selectAll(atom.index())
        else:
            from_non_dummies.append(atom.index())
        if atom.property("final_ambertype") == "du":
            if to_dummies is None:
                to_dummies = molecule.selectAll(atom.index())
            else:
                to_dummies += molecule.selectAll(atom.index())
        else:
            to_non_dummies.append(atom.index())

    return to_dummies, from_dummies, to_non_dummies, from_non_dummies

to

def getDummies(molecule):
    print ("Selecting dummy groups")

    try:
        from_dummies = molecule["atom property initial_ambertype == du"]
    except Exception:
        from_dummies = None

    try:
        to_dummies = molecule["atom property final_ambertype == du"]
    except Exception:
        to_dummies = None

    try:
        from_non_dummies = molecule["atom property initial_ambertype != du"]
    except Exception:
        from_non_dummies = []

    try:
        to_non_dummies = molecule["atom property final_ambertype != du"]
    except Exception:
        to_non_dummies = []

    if len(from_non_dummies) == 1:
        from_non_dummies = [from_non_dummies.index()]
    else:
        from_non_dummies = from_non_dummies.indexes()

    if len(to_non_dummies) == 1:
        to_non_dummies = [to_non_dummies.index()]
    else:
        to_non_dummies = to_non_dummies.indexes()

    return to_dummies, from_dummies, to_non_dummies, from_non_dummies

This should be quicker, as it uses the internal search code in C++, and won't involve a loop over atoms in Python.

I think then that the logic of using these groups to subtract from a whole molecule works and makes more sense than trying to build things up.

For these, you could use the set-based add and subtract functions on the selection results, e.g.

solute_ref_hard = solute.atoms()

if to_dummies is not None:
    solute_ref_hard = solute_ref_hard.subtract(to_dummies)

if from_dummies is not None:
    solute_ref_hard = solute_ref_hard.subtract(from_dummies)

etc etc

Indeed, it may make things easier to read by directly using the search term in the subtraction and removing the getDummies function e.g.

solute_ref_hard = solute.atoms()

try:
    solute_ref_hard = solute_ref_hard.subtract(
        solute["atom property initial_ambertype == du "
               "or atom property final_ambertype == du"])
except Exception:
    # there are no dummies?
    # should we raise an error or exit if there are no dummies?
    solute_ref_hard = None

try:
    solute_ref_to_dummy = solute["atom property final_ambertype == du"]
except Exception:
    solute_ref_to_dummy = None

try:
    solute_ref_from_dummy = solute["atom property initial_ambertype == du"]
except Exception:
    solute_ref_from_dummy = None

try:
    solute_ref_allways_dummy = solute[
                 "atom property initial_ambertype == du "
                 "and atom property final_ambertype == du"]
    # there are some atoms that begin and end as dummies.
    # should we ignore these?
except Exception:
    solute_ref_always_dummy = None

@jmichel80
Copy link
Contributor Author

I would err on not touching too much this code to avoid accidentally breaking production pipelines ... I don't see improving readability of that code as a high priority given our roadmap.

@chryswoods
Copy link
Member

Don't worry - we won't change anything now that is working. This was a suggestion after the design discussion meeting I'd had with Lester, about how we could do these kind of selections in future (e.g. somd2)

@lohedges
Copy link
Member

lohedges commented Nov 2, 2022

@jmichel80: Is it possible to merge my fix branch into devel so we have some working SOMD FEP functionality in 2023. I've checked the resulting selection for a few example systems and get the same result after applying the fix, i.e. compared to Sire 2022 without the fix. Whether there are other issues is another question, but we can revisit as those come up. At present there is no way for anyone to run any SOMD FEP simulation with Sire 2023, so we won't know what issues (if any) there are.

Cheers.

@jmichel80
Copy link
Contributor Author

I have run several FEP calcs with reaction field and PME in the pmefep 2023 branch over the past two weeks. I am happy with the code so far. I suggest to make a PR from that branch to devel to merge in the PME functionality.

@lohedges
Copy link
Member

lohedges commented Nov 2, 2022

Ah brilliant. Shall I do the PR, or is this something that you wanted to do?

@jmichel80
Copy link
Contributor Author

Go ahead this will pull in quite a lot of code that you may want to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.0 Work related to the 2023 release bug
Projects
None yet
Development

No branches or pull requests

5 participants