Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Documentation #1

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,38 @@ Cheminform._, 2009, 1, 8.

**[[2]](/references/2018/20180108_coley_c_w_et_al.md)** Coley, C.W., Rogers, L., Green, W.H., and Jensen, K.F.
**SCScore: Synthetic Complexity Learned from a Reaction Corpus**. _J. Chem. Inf. Model._, 2018, 58, 2, 252-261.

**[[3]](/references/2020/20200520_vorsilak_colar_cmelo_and_svozil.md)** Voršilák, M., Kolář, M., Čmelo, I. and Svozil, D.
**SYBA: Bayesian estimation of synthetic accessibility of organic compounds**. _J. Cheminform._, 2020, 12, 35.

**[[4]](/references/2021/20210307_thakkar_chadimova_bjerrum_engkvist_reymond.md)** Thakkar, A., Chadimová, V., Bjerrum, E.J., Engkvist, O. and Reymond, J.L.
**Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning**. _Chem. Sci._, 2021,12, 3339-3349.

**[[5]](/references/2022/20220203_li_and_chen.md)** Baiqing, L. and Hongming, C. **Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph**. _Molecules._, 2022, 27, 1039.

**[[6]](/references/2022/20220422_liu_korablyov_jastrzebski_pruszynski_bengio_segler.md)** Cheng-Hao, L., Korablyov, M., Jastrzębski, S., Włodarczyk-Pruszyński, P., Bengio, Y., and Segler, M.
**RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software**. _J. Chem. Inf. Model._, 2022, 62, 10, 2293-2300.

**[[7]](/references/2022/20220608_yu_wang_zhao_gao_kang_cao_wang_hou.md)** Yiahui, Y., Wang, J., Zhao, H., Gao, J., Kang, Y., Cao, D., Wang, Z. and Hou, T.
**Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism**. _J. Chem. Inf. Model._, 2022, 62, 12, 2973-2986.

**[[8]](/references/2023/20230831_kim_lee_kim_lim_kim.md)** Hyeongwoo, K., Kyunghoon, L., Chansu, K., Jaechang, L. and Kim, W.Y.
**DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening**. _J. Chem. Inf. Model._, 2023, 64, 7, 2432-2444.

**[[9]](/references/2023/20230919_parrot_tajmouati-dasilva_atwood_fourcade_mathe_huu_perron.md)** Parrot, M., Tajmouati, H., Ribeiro da Silva, V.B., Atwood, B.R., Fourcade, R., Gaston-Mathé, Y., Do Huu, N., and Perron, Q.
**Integrating synthetic accessibility with AI-based generative drug design**. _J. Cheminform._, 2023, 15, 83.

**[[10]](/references/2023/20231102_s.wang_l.wang_li_bai.md)** Wang, S., Wang, L., Fenglei, L. and Bai, F.
**DeepSA: a deep-learning driven predictor of compound synthesis accessibility**. _J. Cheminform._, 2023, 15, 103.

**[[11]](/references/2024/20240723_chen_jung.md)** Chen, S., and Jung, Y.
**Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore**. _J. Cheminform._, 2024, 16, 83.

**[[12]](/references/2024/20241018_neeser_correia_schwaller.md)** Neeser, R.M., Correia, B., Schwaller, P.
**FSscore: A Personalized Machine Learning-Based Synthetic Feasibility Score**. _Chem. Methods_, 2024, 4, e202400024.

**[[13]](/references/2023/20230114_skoraczynski_kitlas_miasojedow_gambin.md)** Skoraczyński, G., Kitlas, M., Miasojedow, B., and Gambin, A.
**Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning**. _J. Cheminform._, 2023, 15, 6.

**[[14]](/references/2024/20240520_raghavan_rago_verma_hassan_goshu_dombrowski_pandey_coley_wang.md)** Raghavan, P., Rago, A.J., Verma, P., Hassan, M.M., Goshu, G.M., Dombrowski, A.W., Pandey, A., Coley, C.W., and Wang, Y.
**Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie’s 15-Year Parallel Library Data Set**. _J. Am. Chem. Soc._, 2024, 146, 22, 15070-15084.
18 changes: 18 additions & 0 deletions references/2020/20200520_vorsilak_colar_cmelo_and_svozil.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Overview
**Title:** SYBA: Bayesian estimation of synthetic accessibility of organic compounds<br>
**Authors:** SYBA: Milan Voršilák, Michal Kolář, Ivan Čmelo, Daniel Svozil<br>
**Publication Date:** 2020/05/20<br>
**Publication Link:** [BMC Journal of Cheminformatics](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00439-2)

# Abstract
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic
compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that
is used to assign SYBA score contributions to individual fragments based on their frequencies in the database
of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules
generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline
method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used
with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms
SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to – 4.5),
SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for
the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly
available at https://github.com/lich-uct/syba under the GNU General Public License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Overview
**Title:** Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning<br>
**Authors:** Amol Thakkar, Veronica Chadimová, Esben Jannik Bjerrum, Ola Engkvist, Jean-Louis Reymond<br>
**Publication Date:** 2021/03/07<br>
**Publication Link:** [Chemical Science](https://pubs.rsc.org/en/content/articlelanding/2021/sc/d0sc05401a)

# Abstract
Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based
tools that are able to propose synthesis routes to a wide range of compounds. However, at present
they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated
compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein
we report a machine learning (ML) based method capable of classifying whether a synthetic route can be
identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return
a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes at least 4500 times
faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for
pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic
accessibility and produce higher quality databases for virtual screening of biological activity.
19 changes: 19 additions & 0 deletions references/2022/20220203_li_and_chen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Overview
**Title:** Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph<br>
**Authors:** Baiqing Li, Hongming Chen<br>
**Publication Date:** 2022/02/03<br>
**Publication Link:** [Molecules](https://www.mdpi.com/1420-3049/27/3/1039)

# Abstract
With the increasing application of deep-learning-based generative models for de novo molecule design,
the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for
prioritizing the structures generated from generative models. It is also useful for helping in the
prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the
USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification
of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then
used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or
hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model)
were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility
scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs
better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models
based on historical reaction knowledge could be a potential tool for estimating molecule SA.
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Overview
**Title:** RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software<br>
**Authors:** Cheng-Hao Liu, Maksym Korablyov, Stanisław Jastrzębski, Paweł Włodarczyk-Pruszyński, Yoshua Bengio, Marwin Segler<br>
**Publication Date:** 2022/04/22<br>
**Publication Link:** [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.jcim.1c01476)

# Abstract
De novo molecule design algorithms often result in chemically unfeasible or synthetically inaccessible molecules.
A natural idea to mitigate this problem is to bias these algorithms toward more easily synthesizable molecules
using a proxy score for synthetic accessibility. However, using currently available proxies can still result in
highly unrealistic compounds. Here, we propose a novel approach, RetroGNN, to estimate synthesizability. First,
we search for routes using synthesis planning software for a large number of random molecules. This information
is then used to train a graph neural network to predict the outcome of the synthesis planner given the target
molecule, in which the regression task can be used as a synthesizability scorer. We highlight how RetroGNN can
be used in generative molecule-discovery pipelines together with other scoring functions. We evaluate our approach
on several QSAR-based molecule design benchmarks, for which we find synthesizable molecules with state-of-the-art
scores. Compared to the virtual screening of 5 million existing molecules from the ZINC database, using RetroGNNScore
with a simple fragment-based de novo design algorithm finds molecules predicted to be more likely to possess the desired
activity exponentially faster, while maintaining good druglike properties and being easier to synthesize. Importantly,
our deep neural network can successfully filter out hard to synthesize molecules while achieving a 105 times speedup
over using retrosynthesis planning software.
20 changes: 20 additions & 0 deletions references/2022/20220608_yu_wang_zhao_gao_kang_cao_wang_hou.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Overview
**Title:** Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism<br>
**Authors:** Jiahui Yu, Jike Wang, Hong Zhao, Junbo Gao, Yu Kang, Dongsheng Cao, Zhe Wang, Tingjun Hou<br>
**Publication Date:** 2022/06/08<br>
**Publication Link:** [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.jcim.2c00038)

# Abstract
Accurate estimation of the synthetic accessibility of small molecules is needed in many phases of drug discovery.
Several expert-crafted scoring methods and descriptor-based quantitative structure–activity relationship (QSAR)
models have been developed for synthetic accessibility assessment, but their practical applications in drug discovery
are still quite limited because of relatively low prediction accuracy and poor model interpretability. In this study,
we proposed a data-driven interpretable prediction framework called GASA (Graph Attention-based assessment of
Synthetic Accessibility) to evaluate the synthetic accessibility of small molecules by distinguishing compounds
to be easy- (ES) or hard-to-synthesize (HS). GASA is a graph neural network (GNN) architecture that makes self-feature
deduction by applying an attention mechanism to automatically capture the most important structural features related to
synthetic accessibility. The sampling around the hypothetical classification boundary was used to improve the ability of
GASA to distinguish structurally similar molecules. GASA was extensively evaluated and compared with two descriptor-based
machine learning methods (random forest, RF; eXtreme gradient boosting, XGBoost) and four existing scores
(SYBA: SYnthetic Bayesian Accessibility; SCScore: Synthetic Complexity score; RAscore: Retrosynthetic Accessibility score;
SAscore: Synthetic Accessibility score).
20 changes: 20 additions & 0 deletions references/2023/20230114_skoraczynski_kitlas_miasojedow_gambin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Overview
**Title:** Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning<br>
**Authors:** Grzegorz Skoraczyński, Mateusz Kitlas, Błażej Miasojedow, Anna Gambin<br>
**Publication Date:** 2023/01/14<br>
**Publication Link:** [BMC](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00678-z)

# Abstract
Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited
by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis
heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness
were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with
common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes
of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic
tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict
the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up
retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search
space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity
parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well
discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools.
Moreover, we show the current challenges of designing computer-assisted synthesis planning tools.
18 changes: 18 additions & 0 deletions references/2023/20230831_kim_lee_kim_lim_kim.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Overview
**Title:** DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening<br>
**Authors:** Hyeongwoo Kim, Kyunghoon Lee, Chansu Kim, Jaechang Lim, Woo Youn Kim<br>
**Publication Date:** 2023/08/31<br>
**Publication Link:** [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.jcim.3c01134)

# Abstract
Recently emerging generative AI models enable us to produce a vast number of compounds for potential applications. While
they can provide novel molecular structures, the synthetic feasibility of the generated molecules is often questioned.
To address this issue, a few recent studies have attempted to use deep learning models to estimate the synthetic accessibility
of many molecules rapidly. However, retrosynthetic analysis tools used to train the models rely on reaction templates automatically
extracted from a large reaction database that are not domain-specific and may exhibit low chemical correctness. To overcome
this limitation, we introduce DFRscore (Drug-Focused Retrosynthetic score), a deep learning-based approach for a more practical
assessment of synthetic accessibility in drug discovery. The DFRscore model is trained exclusively on drug-focused reactions,
providing a predicted number of minimally required synthetic steps for each compound. This approach enables practitioners to
filter out compounds that do not meet their desired level of synthetic accessibility at an early stage of high-throughput virtual
screening for accelerated drug discovery. The proposed strategy can be easily adapted to other domains by adjusting the synthesis
planning setup of the reaction templates and starting materials.
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Overview
**Title:** Integrating synthetic accessibility with AI-based generative drug design<br>
**Authors:** Maud Parrot, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Brian Ross Atwood, Robin Fourcade, Yann Gaston-Mathé, Nicolas Do Huu, Quentin Perron<br>
**Publication Date:** 2023/09/19<br>
**Publication Link:** [BMC](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00742-8)

# Abstract
Generative models are frequently used for de novo design in drug discovery projects to propose new molecules. However,
the question of whether or not the generated molecules can be synthesized is not systematically taken into account
during generation, even though being able to synthesize the generated molecules is a fundamental requirement for such
methods to be useful in practice. Methods have been developed to estimate molecule “synthesizability”, but, so far,
there is no consensus on whether or not a molecule is synthesizable. In this paper we introduce the Retro-Score (RScore),
which computes a synthetic accessibility score of molecules by performing a full retrosynthetic analysis through our
data-driven synthetic planning software Spaya, and its dedicated API: Spaya-API (https://spaya.ai). We start by comparing
several synthetic accessibility scores to a binary “chemist score” as estimated by chemists on a bench of generated molecules,
as a first experimental validation that the RScore is a reliable synthetic accessibility score. We then describe a pipeline
to generate molecules that validate a list of targets while still being easy to synthesize. We further this idea by performing
experiments comparing molecular generator outputs across a range of constraints and conditions. We show that the RScore can
be learned by a Neural Network, which leads to a new score: RSPred. We demonstrate that using the RScore or RSPred as a
constraint during molecular generation enables our molecular generators to produce more synthesizable solutions,
with higher diversity.
17 changes: 17 additions & 0 deletions references/2023/20231102_s.wang_l.wang_li_bai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Overview
**Title:** DeepSA: a deep-learning driven predictor of compound synthesis accessibility<br>
**Authors:** Shihang Wang, Lin Wang, Fenglei Li, Fang Bai<br>
**Publication Date:** 2023/11/02<br>
**Publication Link:** [BMC](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00771-3)

# Abstract
With the continuous development of artificial intelligence technology, more and more computational models for generating
new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy
or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based
computational model called DeepSA, was proposed to predict the synthesis accessibility of compounds, which provides a useful
tool to choose molecules. DeepSA is a chemical language model that was developed by training on a dataset of 3,593,053 molecules
using various natural language processing (NLP) algorithms, offering advantages over state-of-the-art methods and having a much
higher area under the receiver operating characteristic curve (AUROC), i.e., 89.6%, in discriminating those molecules that are
difficult to synthesize. This helps users select less expensive molecules for synthesis, reducing the time and cost required
for drug discovery and development. Interestingly, a comparison of DeepSA with a Graph Attention-based method shows that
using SMILES alone can also efficiently visualize and extract compound’s informative features.
Loading