Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My plan of representing the restraint #1

Closed
wants to merge 32 commits into from
Closed

My plan of representing the restraint #1

wants to merge 32 commits into from

Conversation

xiki-tempula
Copy link

@xiki-tempula xiki-tempula commented May 24, 2022

Please don't merge this PR yet.
After the discussion of michellab#321, I have tried to show my plan of representing the restraint https://github.com/fjclark/BioSimSpace/pull/1/files#diff-7ea50c3160a47ea78b09fa99cd48c243de7b49074fc5e62313221400c834ac21R40-R56
And the definition for each field
https://github.com/fjclark/BioSimSpace/pull/1/files#diff-7ea50c3160a47ea78b09fa99cd48c243de7b49074fc5e62313221400c834ac21R59-R98
@fjclark Do you mind have a look and tell me your thought? Thanks.

@fjclark
Copy link
Owner

fjclark commented May 25, 2022

Thanks for this.

In terms of your modifications to _restraint.py, I agree that using the production protocol makes sense.

I think the tests look good for your suggested representation. However, to avoid user error, it might be good to provide a setter method for the restraints, e.g.:

restraint.setBoresch("anchor_points":{"r1":receptor1, "r2":receptor2, 
"r3":receptor3, "l1":ligand1, "l2":ligand2, "l3":ligand3},
 "equilibrium_values":{"r0":7.84, "thetaA0":0.81, "thetaB0":1.74,"phiA0":2.59, 
"phiB0":-1.20, "phiC0":2.63},
 "force_constants":{"kr":10, "kthetaA":10, "kthetaB":10, "kphiA":10,
 "kphiB":10, "kphiC":10})

Where this could use default units (but allow for other units to be passed in). This might prevent users, for example, using the anchor points inconsistently in the Boresch DOF (e.g. using an atom in the bond which is not present in the angles). It might also be good to add a test to check for this. Thanks

@xiki-tempula
Copy link
Author

xiki-tempula commented May 25, 2022

@fjclark You are right, I will adopt this format.

it might be good to provide a setter method for the restraints

So my thought is that we would have

restraint_routine = BSS.FreeEnergy.Restraint(...)
best_frame, restraint = restraint_routine.analysis()
ABFE = BSS.FreeEnergy.Relative(best_frame, property_map={'restraint': restraint}

So we would have the check for restraint in BSS.FreeEnergy.Relative and will not set the restraint in the BSS.FreeEnergy.Restraint

@xiki-tempula
Copy link
Author

@fjclark So currently, I will be working on implementing the test for this data structure and the downstream writing for Gromacs in this PR.
You will be working from the other side where you will ensure that when the restraint is generated it will be stored in restraint._Boresch

@fjclark
Copy link
Owner

fjclark commented May 25, 2022

@xiki-tempula this sounds good. However, what do you think of my suggestion to have both BSS.FreeEnergy.RestraintSearch (what is currently BSS.FreeEnergy.Restraint), and BSS.FreeEnergy.Restraint (to hold an engine-agnostic representation of the restraints with potentials as algebraic expressions) (see michellab#321)?

If you agree that this is a sensible approach, I think a sensible work plan from my side would be:

  1. Implement BSS.FreeEnergy.Restraint so that it can hold restraints and the whole workflow can be tested with user-selected restraints
  2. Implement BSS.FreeEnergy.RestraintSearch so that the above can be repeated with restraints selected from simulation.

What do you think? Thanks

@xiki-tempula
Copy link
Author

@fjclark I think

Implement BSS.FreeEnergy.RestraintSearch so that the above can be repeated with restraints selected from simulation.

Is quite sensible.

Implement BSS.FreeEnergy.Restraint so that it can hold restraints and the whole workflow can be tested with user-selected restraints

I think this is also quite sensible. I will work on this in this PR where I will implement the checker and the toString method for altering the Gromacs topology.

For now I think we can have the data structure as #1 (comment). I'm an industry worker so the deadline for implementing a ABFE prototype is the end of this month and so (to hold an engine-agnostic representation of the restraints with potentials as algebraic expressions) might be a bit too much for the moment.

The deadline of getting a ABFE prototype by the end of this month is a deadline for me so don't worry if you cannot get BSS.FreeEnergy.RestraintSearch done, I could swap in the MDRestrintGenerator for now.

@fjclark
Copy link
Owner

fjclark commented May 25, 2022

@xiki-tempula OK - I will work exclusively on BSS.FreeEnergy.RestraintSearch now, based on the data structure you highlighted.

I appreciate your need to work quickly and I'm sorry that I have not had as much time as I would have liked to work on this. I will do my best with BSS.FreeEnergy.RestraintSearch, but I have some other work to complete this week, and am away/ at a conference from the 28th - 8th, so it is likely I will not complete this before the end of the month - sorry about that.

@xiki-tempula
Copy link
Author

@fjclark No worry, we had anticipated this and the origin plan is that I will get this whole thing done and then you will be join us so your help is very appreciated.

@xiki-tempula
Copy link
Author

@fjclark I have finished the data structure of the Restraint, do you mind have a look?


'''
def __init__(self, system, restraint_dict, type='Boresch'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to avoid passing in the system with the restraint, if we are planning to do this:

restraint , best_frame = restraint_search.analyse() # where restraint is of the type BSS.FreeEnergy.Restraint
protocol = BSS.Protocol.FreeEnergy(restraint=restraint, ...) 
ABFE = BSS.FreeEnergy.Absolute(system, protocol, work_dir="output") # Handle writing restraints here based on the engine

Because storing the system in the restraint class would mean effectively passing in the system twice to BSS.FreeEnergy.Absolute . Maybe it would be more robust to take the system as an argument to toString, so in BSS.FreeEnergy.Absolute(system, protocol, work_dir="output") (or relative), only one system is considered, and there can never be any issues with differing systems. BSS.FreeEnergy.Absolute could then call Restraint.toString(system,...) internally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because storing the system in the restraint class would mean effectively passing in the system twice to BSS.FreeEnergy.Absolute.

This is a tricky issue, I do agree that passing the system twice is not optimal but I think passing system to toString is not intuitive.

I realised that I forget to implement this but I will also need to check if the ligand atoms are in the decoupled molecule, which is a property of the system.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be optimal to check if the ligand atoms are in the decoupled molecule in Restraint, and I agree that passing system to toString is not the clearest.

However, I find would find it most intuitive if Restraint was as separate as possible from the system. For example, maybe the same restraints would be used for different systems (maybe starting from different conformations without bothering to run another RestraintSearch):

ABFE1 = BSS.FreeEnergy.Absolute(system_conformation1, protocol, work_dir="output")
ABFE2 = BSS.FreeEnergy.Absolute(system_conformation2, protocol, work_dir="output")

In this case, it might cause some confusion if ABFE2 has effectively been passed two different systems.

If I understand correctly: I was thinking that BSS.Free.Energy.Absolute (or relative) might be the best place to check if the ligand atoms are in the decoupled molecule, because we will have to check here anyway (in case the user has accidentally passed in a different system with a ligand marked for decoupling) and might as well not duplicate the check. However, I can see benefits of also checking in Restraint (e.g. if a user wants to run the subsequent calculations outside of BSS), so happy to stick with the current approach if you prefer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have give it some thought but I cannot think of an easy way out.

So in the later stage, the BSS.FreeEnergy.Restraint object will be part of the BSS._SireWrappers.System. This should take care of the topology writer issue as well. However, this could be a large task and might take some time.

My vision of the current version of the BSS.FreeEnergy.Restraint object is that it will be a patch to the System to hold the Restraint.

With regard to the issue of maybe starting from different conformations without bothering to run another RestraintSearch
One solution could be that give BSS.FreeEnergy.Restraint a method update_system(), to update the underlaying system without changing the restraint.

f'yet. Only boresch restraint is supported.')

def toString(self, engine='Gromacs'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly pass the system in here (see previous comment).

import pytest

import BioSimSpace.Sandpit.Exscientia as BSS
from BioSimSpace.Sandpit.Exscientia.Align._decouple import decouple
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be _decouple or decouple? My understanding was that the user would have to directly use decouple, so would from BioSimSpace.Sandpit.Exscientia.Align.decouple import decouple be better?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I guess I copied and pasted this command from somewhere so hadn't changed it.

self._system = system.copy()

if type.lower() == 'boresch':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use rest_type instead of type to be consistent with RestraintSearch, and to stay away from type()?

@xiki-tempula
Copy link
Author

@fjclark Thanks for the review and comments, I have made the changes. Do you mind have a look, please? Thank you.
Mainly

  • add the update_system method
  • Change the name of the type
  • Optimise the write topology method

@fjclark
Copy link
Owner

fjclark commented May 26, 2022

@xiki-tempula Thanks for the changes. I only have a few minor comments:

else:
raise NotImplementedError(f'Restraint type {type} not implemented '
f'yet. Only boresch restraint is supported.')

self._content = restraint_dict
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would self._restraint_dict be clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

atom = self._content['anchor_points'][key]
if not atom._sire_object.molecule().number() == decoupled_mol._sire_object.number():
raise ValueError(
f'The ligand atom {key} is not from decoupled moleucle.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - "moleucle"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch


# Store a copy of solvated system.
self._system = system.copy()

def toString(self, engine='Gromacs'):
'''The method for convert the restraint to a format that could be used
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise you're in a rush, but if you have time later, maybe make the docstrings more complete in general (e.g. include parameters)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I will add them now

(decoupled_mol,) = system.getDecoupledMolecules()
for key in ['l1', 'l2', 'l3']:
atom = self._content['anchor_points'][key]
if not atom._sire_object.molecule().number() == decoupled_mol._sire_object.number():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to loop over l1, 2, and 3, if we expect them all to have the same molecule number, and they are all compared to the same number (decoupled_mol._sire_object.number())?

Would it be more robust and specific to do something like:

for key in ['l1', 'l2', 'l3']:
   atom = self._content['anchor_points'][key]
   if not atom in decoupled_mol:
...

Could also check the protein anchor atoms generally with if atom in system, if I am understanding this correctly: michellab#331.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjclark This is a new feature
Do you mind merge the michellab/BioSimSpace/devel into this devel so I could test it? Thanks.

@xiki-tempula
Copy link
Author

xiki-tempula commented May 26, 2022

@fjclark Many thanks for the help.
I will try to merge this branch to my branch and work from there.

@fjclark
Copy link
Owner

fjclark commented May 26, 2022

@xiki-tempula , great, and thank you.

@xiki-tempula
Copy link
Author

@fjclark When you are back, just give me a shout and I can point you towards the lates version.

@xiki-tempula
Copy link
Author

xiki-tempula commented Jun 6, 2022

@fjclark So I think my version is stable now. Please see the branch https://github.com/Exscientia/BioSimSpace/tree/feat_gen_rest

I have incorporated your changes so I would recommend that you could reset to my branch and work from that to avoid conflict hell.

So I setup my environment as

conda create -n bss_dev python=3.7
conda activate bss_dev
conda install -c conda-forge -c michellab/label/dev biosimspace python=3.7 --only-deps --experimental-solver=libmamba
conda install -c conda-forge ambertools alchemlyb --experimental-solver=libmamba
conda install -c bioconda gromacs --experimental-solver=libmamba
python -m pip install git+https://github.com/IAlibay/MDRestraintsGenerator.git 

The complete API to run the restraint generation is (assume that the ligand and protein is the BSS.molecule which has been parameterised).

import BioSimSpace.Sandpit.Exscientia as BSS
from BioSimSpace.Sandpit.Exscientia.Align import decouple
# Decouple the ligand
decoupled_ligand = decouple(ligand)
new_system = (protein + decoupled_ligand).toSystem()
# Solvation
solvated = BSS.Solvent.tip3p(molecule=new_system, box=3 * [6.5 * BSS.Units.Length.nanometer])
# Energy Minimisation
protocol = BSS.Protocol.Minimisation()
process = BSS.Process.Gromacs(solvated, protocol, work_dir='em')
process.start()
em = process.getSystem(block=True)
# Equilibration
em.repartitionHydrogenMass()
protocol = BSS.Protocol.Equilibration()
process = BSS.Process.Gromacs(em, protocol, work_dir='equil')
process.setArg("-update", "gpu")
process.start()
equil = process.getSystem(block=True)
# Production run to get the restraint
protocol = BSS.Protocol.Production(timestep=2*BSS.Units.Time.femtosecond, runtime=15*BSS.Units.Time.nanosecond)
restraint_search = BSS.FreeEnergy.RestraintSearch(equil, protocol=protocol, engine='GROMACS', gpu_support=True, ignore_warnings=True, work_dir='restraint')
restraint_search.start()
restraint = restraint_search.analyse(method='MDRestraintsGenerator', block=True)
# Run the protein leg of the ABFE
protocol = BSS.Protocol.FreeEnergyMinimisation(
    lam=pd.Series(data={'bonded': 0.0, 'coul': 0.0, 'vdw': 0.0}),
    lam_vals=pd.DataFrame(
                data={'bonded': [0.0, 0.01, 0.025, 0.05, 0.075, 0.1, 0.2, 0.35, 0.5, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
                      'coul': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.16, 0.33, 0.5, 0.67, 0.83, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
                      'vdw':  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]}
                         ))
protein_em = BSS.FreeEnergy.Relative(restraint.system, protocol, engine='GROMACS', restraint=restraint, work_dir='protein_em')

# Set up the ligand leg
# Solvation
solvated = BSS.Solvent.tip3p(molecule=decoupled_ligand, box=3 * [3.06 * BSS.Units.Length.nanometer])
protocol = BSS.Protocol.FreeEnergyMinimisation(
    lam=pd.Series(data={'coul': 0.0, 'vdw': 0.0}),
    lam_vals=pd.DataFrame(
                data={'coul': [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
                      'vdw':  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]}
                         ))
ABFE_em = BSS.FreeEnergy.Relative(solvated, protocol, engine='GROMACS', work_dir='ligand_em')

@fjclark
Copy link
Owner

fjclark commented Jun 8, 2022

@xiki-tempula brilliant. Thanks very much for the information. I will reset to your branch and open a pull request to feat_gen_restraint when I continue work on this.

To update you on my work plan: my main focus for this month is analysing/ running some more simulations and producing a report for a deadline at the end of the month, so I am unlikely to do much work on BSS. However, from early July, I plan to work on implementing ABFE for SOMD.

Thanks

@xiki-tempula
Copy link
Author

@fjclark Hi, I have polished up the ABFE generation and the pipeline is in https://github.com/Exscientia/BioSimSpace devel branch now. When you start the development. Do you mind reset to the devel branch? Thanks.

@fjclark
Copy link
Owner

fjclark commented Jul 4, 2022

@xiki-tempula thanks for the update - I will do. Thanks.

@xiki-tempula xiki-tempula deleted the feat_restraint branch July 5, 2022 14:10
fjclark pushed a commit that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants