ipTM scores are on the higher side for interactors and noninteractors #78

rakeshr10 · 2025-03-03T12:42:43Z

Hi,

I am observing higher ipTM scores for both known interactors and noninteractors when protenix was tried on a couple of examples mentioned in this paper https://pubs.acs.org/doi/10.1021/acs.jcim.3c01805.

Ideally for most interacting proteins the ipTM scores should be > 0.5 and < 0.5 for noninteracting ones. I have observed this for AF2.3 and Chai1 for the examples mentioned in the paper whereas for protenix the scores are always >0.5.

I was wondering if there is any difference in the way the ipTM confidence head has been trained for protenix vs others and this is always leading to higher ipTM scores. Would appreciate your comments on this observation.

Regards
Rakesh

gieses · 2025-03-05T09:56:54Z

there is an issue about this, #4. I also saw a similar behavior.

ChengYueGong10032 · 2025-03-06T07:12:18Z

Thanks for your feedback @rakeshr10 .
As pointed out by @gieses , this is related to the over-confidence problem.
For the model, the "native threshold" for differentiating positive from negative classes is not 0.5. Instead, it could be a "higher" value, such as 0.8. In some cases, it may be a lower value, e.g., "0.3", and we will call this under-confident.
When developing the published version, our focus was more on the correlation. Merely looking at the value itself might have led to some misunderstandings. In the upcoming checkpoint, we have resolved this problem and ensured that the prediction distribution aligns with the ground truth distribution. It will be more user-friendly.

Regarding your problem, I have another suggestion. When performing binary classification for different tasks, I recommend tuning the threshold (in your case, you're using 0.5) to determine the most suitable value for any model you intend to use and get best practice. Firstly, different models may exhibit over-confidence or under-confidence depending on various use cases (e.g., different types of interfaces). Secondly, even for a general binary classifier, 0.5 is not always the optimal threshold. For example, when evaluating the performance of a binary classifier using AUROC (a widely-used metric in machine learning), we don't rely solely on a single threshold like the commonly assumed 0.5. The AUROC metric considers the classifier's performance across all possible classification thresholds. This is because different thresholds can greatly influence the classifier's performance metrics, such as precision, recall, and F1-score.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipTM scores are on the higher side for interactors and noninteractors #78

ipTM scores are on the higher side for interactors and noninteractors #78

rakeshr10 commented Mar 3, 2025

gieses commented Mar 5, 2025

ChengYueGong10032 commented Mar 6, 2025

ipTM scores are on the higher side for interactors and noninteractors #78

ipTM scores are on the higher side for interactors and noninteractors #78

Comments

rakeshr10 commented Mar 3, 2025

gieses commented Mar 5, 2025

ChengYueGong10032 commented Mar 6, 2025