Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipTM scores are on the higher side for interactors and noninteractors #78

Open
rakeshr10 opened this issue Mar 3, 2025 · 2 comments
Open

Comments

@rakeshr10
Copy link

Hi,

I am observing higher ipTM scores for both known interactors and noninteractors when protenix was tried on a couple of examples mentioned in this paper https://pubs.acs.org/doi/10.1021/acs.jcim.3c01805.

Ideally for most interacting proteins the ipTM scores should be > 0.5 and < 0.5 for noninteracting ones. I have observed this for AF2.3 and Chai1 for the examples mentioned in the paper whereas for protenix the scores are always >0.5.

I was wondering if there is any difference in the way the ipTM confidence head has been trained for protenix vs others and this is always leading to higher ipTM scores. Would appreciate your comments on this observation.

Regards
Rakesh

@gieses
Copy link

gieses commented Mar 5, 2025

there is an issue about this, #4. I also saw a similar behavior.

@ChengYueGong10032
Copy link

Thanks for your feedback @rakeshr10 .
As pointed out by @gieses , this is related to the over-confidence problem.
For the model, the "native threshold" for differentiating positive from negative classes is not 0.5. Instead, it could be a "higher" value, such as 0.8. In some cases, it may be a lower value, e.g., "0.3", and we will call this under-confident.
When developing the published version, our focus was more on the correlation. Merely looking at the value itself might have led to some misunderstandings. In the upcoming checkpoint, we have resolved this problem and ensured that the prediction distribution aligns with the ground truth distribution. It will be more user-friendly.

Regarding your problem, I have another suggestion. When performing binary classification for different tasks, I recommend tuning the threshold (in your case, you're using 0.5) to determine the most suitable value for any model you intend to use and get best practice. Firstly, different models may exhibit over-confidence or under-confidence depending on various use cases (e.g., different types of interfaces). Secondly, even for a general binary classifier, 0.5 is not always the optimal threshold. For example, when evaluating the performance of a binary classifier using AUROC (a widely-used metric in machine learning), we don't rely solely on a single threshold like the commonly assumed 0.5. The AUROC metric considers the classifier's performance across all possible classification thresholds. This is because different thresholds can greatly influence the classifier's performance metrics, such as precision, recall, and F1-score.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants