Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match? #9

CallMeDek · 2022-09-22T02:34:47Z

Hi,

I am trying to get results of my own data with your model.

(1) According to the file "DeepAffinity_inference.sh", it seems that the number of lines for input protein sequences file and compound file must matches like below.

Is it mean that the number of each entity in both files have to be matched or literally the the number of lines of both files have to be matched?

(2) I got two files for my own data after following your manual.
Could you tell me if their entities' structure are correct for model input?

CID_Smi_Feature:
protein_grouped_finalPresentation

Thank you,
CallMeDek

Shen-Lab · 2022-09-23T12:49:04Z

The number of the protein sequences and that of compound files are asked to be equal because we are predicting given pairs of proteins and compounds. So the k-th row of the protein file is paired to the k-th row of the compound file. If you are interested in cross prediction for all combinations of given proteins and given compounds, you can write a simple script to prepare the two files (with repeats) without having to change our scripts. Otherwise you can change the script through the for loop (use nested for loops instead).

I am not sure what exactly you are asking in the second question. Please kindly detail your question and could @AstroSign please follow up if possible?

AstroSign · 2022-09-23T14:43:13Z

For the second question, your data looks good to me. Let me know if you encountered further issues.

…

On Sep 23, 2022, at 8:49 AM, Shen Lab at Texas A&M University ***@***.***> wrote: The numbers of the protein sequences and that of compound files are asked to be equal because we are predicting given pairs of proteins and compounds. So the k-th row of the protein file is paired to the k-th row of the compound file. If you are interested in cross prediction for all combinations of given proteins and given compounds, you can write a simple script to prepare the two files (with repeats) without having to change our scripts. Otherwise you can change the script through the for loop (use nested for loops instead). I am not sure what exactly you are asking in the second question. Please kindly detail your question and could @AstroSign <https://github.com/AstroSign> please follow up if possible? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFM5XI5EALQ4J45ZHC55PGLV7WRMZANCNFSM6AAAAAAQSTWOPA>. You are receiving this because you were mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match? #9

Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match? #9

CallMeDek commented Sep 22, 2022

Shen-Lab commented Sep 23, 2022 •

edited

Loading

AstroSign commented Sep 23, 2022 via email

Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match? #9

Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match? #9

Comments

CallMeDek commented Sep 22, 2022

Shen-Lab commented Sep 23, 2022 • edited Loading

AstroSign commented Sep 23, 2022 via email

Shen-Lab commented Sep 23, 2022 •

edited

Loading