Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference of reporting experiment results #7

Open
yufengwhy opened this issue Jun 23, 2022 · 27 comments
Open

difference of reporting experiment results #7

yufengwhy opened this issue Jun 23, 2022 · 27 comments

Comments

@yufengwhy
Copy link

There is difference between reporting experiment results of paper and results from github code.
Is the paper results from TBind v1.0.1, the github from TBind v0.5.0 ???

@luwei0917
Copy link
Owner

luwei0917 commented Jun 24, 2022

the result from the GitHub model should be about the same as the one reported in preprint. we only slightly updated the coordinates generation part of the code. The v1.0.1 is not released yet.

@yufengwhy
Copy link
Author

@luwei0917
The model outputs the same affinity value for different structures of the same ligand: RDKit structure, PDBBind structure and predict structures of TankBind.
Is this normal?

@luwei0917
Copy link
Owner

Yes. In my opinion, that's the beauty of TankBind. You don't need prior knowledge of the complex structure to predict the affinity.

@yufengwhy
Copy link
Author

@luwei0917
That sounds awesome.
I am curious about how tankbind achieve that the predicted affinity is conformation-agnostic, that is, the input of coordinates of ligands do not influence the outputted affinity;
If so, why can tankbind update these coordinates of ligands to improve the docked pose of ligands.

@luwei0917
Copy link
Owner

The coordinates of the ligand is determined based on its interaction with the protein and constraint imposed by chemical bonds and excluded volume effects.

@yufengwhy
Copy link
Author

@luwei0917

  1. predict coord
    D^c_{j,k} in Equation (5) is the group truth distance from PDBBind?
    If so, the label is leaked?
  2. predict affinity
    D_{ij} in L_distance around Equation (4) is the group truth distance from PDBBind?
    If so, no matter what conformations input to the model, the best conformation can always be obtained, and the predict
    affinity is always based on the best conformation ?

@luwei0917
Copy link
Owner

luwei0917 commented Jun 27, 2022

It is based on the RDKit generated conformation, not PDBbind when testing or inferencing . It is based on PDBbind conformation during training.
A side note, even during training, under the self-dock setting, we also impose a LAS mask to the D^c_{j,k}.
2.
This is computing loss for training the model. We want our predicted D_{ij} to be as close to the ground truth as possible.
Affinity prediction is always based on the predicted interaction embedding.

The conformation of the ligand is predicted, not given.

@yufengwhy
Copy link
Author

@luwei0917

  1. "It is based on the RDKit generated conformation when testing or inferencing." This will make the conformation closer to RDKit, i.e., farther to PDB conformation?

@luwei0917
Copy link
Owner

The LAS mask exists in self dock. therefore, only the basic constraint is imposed. The general conformation is determined by the interaction.

@yufengwhy
Copy link
Author

@luwei0917
Is TankBind SE(3)-equivariant?

@luwei0917
Copy link
Owner

yes. I think so. relative distance is SE(3)-equivariant.

@yufengwhy
Copy link
Author

The LAS mask exists in self dock. therefore, only the basic constraint is imposed. The general conformation is determined by the interaction.

LAS includes rings, and the N-O-C ring in the predict example of your code, however, the N-O-C ring is not rigid, which should not in the LAS mask ?

@yufengwhy
Copy link
Author

yufengwhy commented Jun 28, 2022

protein's GVP embedding and ligand's torchdrug feature, is these features invariant or equivariant?

@luwei0917
Copy link
Owner

I used "Chem.GetSymmSSSR" function in RDKit to get the ring. Could be that some ring should be rigid but considered flexible, or other way around. If you have better solution, please let me know.

@yufengwhy
Copy link
Author

I used "Chem.GetSymmSSSR" function in RDKit to get the ring. Could be that some ring should be rigid but considered flexible, or other way around. If you have better solution, please let me know.

Chem.GetSymmSSSR is used to obtain symmetry smallest ring, not necessarily rigid. So we could choose from "Chem.GetSymmSSSR" rings that have only one RDKit conformation?

@luwei0917
Copy link
Owner

Right, but how could I know if the ring only have single RDKit conformation?

@yufengwhy
Copy link
Author

yufengwhy commented Jun 28, 2022

from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np

def generate_conformation(mol):
    mol = Chem.AddHs(mol)
    ps = AllChem.ETKDGv2()
    try:
        rid = AllChem.EmbedMolecule(mol, ps)
        AllChem.MMFFOptimizeMolecule(mol, maxIters=500, confId=0)
    except:
        mol.Compute2DCoords()
    mol = Chem.RemoveHs(mol)
    return mol

def generate_coords(smiles = 'c1ccccc1'):
    mol_from_rdkit = Chem.MolFromSmiles(smiles)
    mol_from_rdkit = generate_conformation(mol_from_rdkit)
    coords = mol_from_rdkit.GetConformer().GetPositions()
    return coords
    
def rigid_transform_3d(A, B):
    """ Find the rotation  and translation matrix of point set A and B
    :param A: A 3d "source" vector
    :param B: A 3d "destination" vector
    :return R: A rotation matrix (3x3)
    :return t: A translation vector
    Align A to B: np.matmul(A, R.T) + t.reshape([1, 3])
    """
    assert len(A) == len(B), "Invalid length"  # Length of two vector should be the same
    N = A.shape[0]  # total points
    centroid_A = np.mean(A, axis=0)
    centroid_B = np.mean(B, axis=0)
    # centre the points
    AA = A - np.tile(centroid_A, (N, 1))
    BB = B - np.tile(centroid_B, (N, 1))
    H = np.matmul(np.transpose(AA), BB)
    U, S, Vt = np.linalg.svd(H)
    R = np.matmul(Vt.T, U.T)
    # special reflection case
    if np.linalg.det(R) < 0:
        print("Reflection detected")
        Vt[2, :] *= -1
        R = np.matmul(Vt.T, U.T)
    t = -np.matmul(R, centroid_A) + centroid_B
    return R, t
    
def get_err(smiles = 'c1ccccc1'):
    A, B = generate_coords(smiles), generate_coords(smiles)
    R, t = rigid_transform_3d(A, B)
    err = B - np.matmul(A,R.T) - t.reshape([1, 3])
    return np.mean(np.abs(err))
    
def rigid_ring(smiles = 'c1ccccc1'):
    # get_err('c1ccccc1')  # 1e-6  
    # get_err('C1CNCOC1')  # 0.158  
    if get_err(smiles) > 1e-3:
        return False
    else:
        return True

@yufengwhy
Copy link
Author

yufengwhy commented Jun 29, 2022

the result from the GitHub model should be about the same as the one reported in preprint. we only slightly updated the coordinates generation part of the code. The v1.0.1 is not released yet.

The checkpoint and predict code cannot reproduce the results in the paper, especially the 'mean' column in Table 1. Have you test before? Any suggestions?
@luwei0917

@luwei0917
Copy link
Owner

luwei0917 commented Jun 29, 2022

Thanks for the "rigid_ring" function.
The number should be about the same, we tested it before.
I guess it could be atom index misalignment. the torchdrug input is the smiles. so the index might be different from the input sdf.
I used following script to re-order atoms.
sm = Chem.MolToSmiles(mol)
m_order = list(mol.GetPropsAsDict(includePrivate=True, includeComputed=True)['_smilesAtomOutputOrder'])
mol = Chem.RenumberAtoms(mol, m_order)

If this is not the problem, can you send me your folder/scripts to [email protected] and I will take a look.

@Anfankus
Copy link

I have solved the re-order problem, but still can't reproduce the result. The email with the code and the reproduced results has been sent to your email address.

Hope to get your suggestion.

@Anfankus
Copy link

Anfankus commented Jul 2, 2022

I have solved the re-order problem, but still can't reproduce the result. The email with the code and the reproduced results has been sent to your email address.

Hope to get your suggestion.

Thanks for your suggestion email. There are still some diffs in the new reproduced results and the details have been sent to your email.

Hope to get your reply.

@Yuki0613
Copy link

I am hitting analogous issue.
There is a gap between the results we reproduced and those reported in the paper, especially for the metric RMSD.
@Anfankus @yufengwhy Could you reproduce the results later?

@luwei0917
Copy link
Owner

@Yuki0613 I uploaded the notebook for evaluating the test set.

@yufengwhy
Copy link
Author

@Yuki0613 I uploaded the notebook for evaluating the test set.
I think the main difference is the data process, because if i use processed data of mine, there is still the performance diff. @Yuki0613
Can the code to process pdbbind to the test data be provided? @luwei0917

  1. I use the data process code in this code base, but got 3265 pocket samples in the test set, however your notebook shows 2879 pocket samples.
  2. what's the process of 'renumber_atom_index_same_as_smiles' in the notebook?

@luwei0917
Copy link
Owner

I used top10 p2rank pockets. you probably used all pockets.
"renumber_atom_index_same_as_smiles" folder is created by cell10 of
https://github.com/luwei0917/TankBind/blob/ed4c87f47b8268e7ee4f4dc824656fdfe6883593/examples/construction_PDBbind_training_and_test_dataset.ipynb

@yufengwhy
Copy link
Author

Can the training code be updated?
For example in the data.py, many names of pt files are mismatched with those in contruction_dataset notebook.
Moreover, there are two compound_dict in new_dataset.compound_dict and all_pocket_test.compound_dict, but these is only one compound_dict in the contruction_dataset notebook?
@luwei0917

@QizhiPei
Copy link

QizhiPei commented Nov 16, 2022

Can the training code be updated? For example in the data.py, many names of pt files are mismatched with those in contruction_dataset notebook. Moreover, there are two compound_dict in new_dataset.compound_dict and all_pocket_test.compound_dict, but these is only one compound_dict in the contruction_dataset notebook? @luwei0917

I think new_dataset.compound_dict is built by construction_PDBbind_training_and_test_dataset.ipynb notebook, which is named compound_torchdrug_features.pt. The other two dicts (should be the same), all_pocket_test.compound_dict and all_pocket_test.compound_dict, is built by testset_evaluation_cleaned.ipynb notebook, which is named pdbbind_test_compound_dict_based_on_rdkit.pt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants