Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
SeonghwanSeo committed Oct 11, 2024
1 parent df41275 commit 38176b9
Show file tree
Hide file tree
Showing 20 changed files with 2,321 additions and 52 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ slurm-*
wandb*
nogit*/
unidock_2024*
weights/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
131 changes: 123 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,36 @@
[![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](https://arxiv.org/abs/2410.04542)
[![Python versions](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/downloads/)
[![license: MIT](https://img.shields.io/badge/License-MIT-purple.svg)](LICENSE)

# RxnFlow: Generative Flows on Synthetic Pathways for Drug Design
# RxnFlow: Generative Flows on Synthetic Pathway for Drug Design

Official implementation of ***Generative Flows on Synthetic Pathways for Drug Design*** by Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyu Park, Sungsoo Ahn, and Woo Youn Kim.

[paper]
Official implementation of ***Generative Flows on Synthetic Pathway for Drug Design*** by Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyu Park, Sungsoo Ahn, and Woo Youn Kim. [[arXiv](https://arxiv.org/abs/2410.04542)]

RxnFlow are a synthesis-oriented generative framework that aims to discover diverse drug candidates through GFlowNet objective and a large action space.

- RxnFlow can operate on large synthetic action spaces comprising 1.2M building blocks and 71 reaction templates without memory overhead.
- RxnFlow can operate on large synthetic action spaces comprising 1.2M building blocks and 71 reaction templates without compute overhead
- RxnFlow can explore broader chemical space within less reaction steps, resulting in higher diversity, higher potency, and lower synthetic complexity of generated molecules.
- RxnFlow can generate molecules with expanded or modified building block libaries without retraining.

The implementation of this project builds upon the [recursionpharma/gflownet](https://github.com/recursionpharma/gflownet) with MIT license.
The implementation of this project builds upon the [recursionpharma/gflownet](https://github.com/recursionpharma/gflownet) with MIT license. This repository was developed for research, and the code for real-world drug discovery will be released later.

## Setup

### Install

```bash
# python: 3.10
conda install openbabel
conda install openbabel # For PharmacoNet
pip install -e . --find-links https://data.pyg.org/whl/torch-2.3.1+cu121.html

# For UniDock
conda install openbabel unidock
pip install -e '.[unidock]' --find-links https://data.pyg.org/whl/torch-2.3.1+cu121.html
```

### Data

To construct the synthetic action space, RxnFlow requires the reaction teamplate set and the building block library.
To construct the synthetic action space, RxnFlow requires the reaction template set and the building block library.

The reaction template used in this paper contains 13 uni-molecular reactions and 58 bi-molecular reactions, which is constructed by [Cretu et al](https://github.com/mirunacrt/synflownet). The template set is available under [data/template/hb_edited.txt](data/template/hb_edited.txt).

Expand All @@ -50,3 +53,115 @@ The Enamine building block library is available upon request at [https://enamine
```bash
python scripts/b2_smi_to_env.py -b <SMILES-FILE> -d ./envs/<ENV> --cpu <CPU>
```

## Experiments

### Docking-QED multi-objective optimization with GPU-accelerated UniDock

Multi-objective optimization ([Multi-objective GFlowNet](https://arxiv.org/abs/2210.12765)) for docking score and QED. This uses GPU-accelerated [UniDock](https://pubs.acs.org/doi/10.1021/acs.jctc.2c01145).

```bash
python script/opt_unidock.py -h
python script/opt_unidock.py \
-p <Protein PDB path> \
-c <Center X> <Center Y> <Center Z> \
-l <Reference ligand, required if center is empty. > \
-o <Output directory> \
-n <Num Oracles (default: 1000)> \
--batch_size <Num generations per oracle; default: 64> \
--env_dir <Environment directory> \
--subsample_ratio <Subsample ratio; memory-variance trade-off; default: 0.01>
```
**Example (KRAS G12C mutation)**
- Use center coordinates
```bash
python script/opt_unidock.py -p ./data/examples/6oim_protein.pdb -c 1.872 -8.260 -1.361 -o ./log/kras
```
- Use center of the reference ligand
```bash
python script/opt_unidock.py -p ./data/examples/6oim_protein.pdb -l ./data/examples/6oim_ligand.pdb -o ./log/kras
```
### Zero-shot sampling with Pharmacophore-based QuickVina Proxy
Sample high-affinity molecules. The QuickVina docking score is estimated by Proxy Model [[github](https://github.com/SeonghwanSeo/PharmacoNet/tree/main/src/pmnet_appl)].
```bash
python script/sampling_zeroshot.py -h
python script/sampling_zeroshot.py \
-p <Protein PDB path> \
-c <Center X> <Center Y> <Center Z> \
-l <Reference ligand, required if center is empty. > \
-o <Output path: `smi|csv`> \
-n <Num samples (default: 100)> \
--env_dir <Environment directory> \
--model_path <Checkpoint path; default: None (auto-downloaded)> \
--subsample_ratio <Subsample ratio; memory-variance trade-off; default: 0.01> \
--cuda
```
**Example (KRAS G12C mutation)**
- csv file: Save molecules with their rewards (GPU is recommended for reward calculation)
```bash
python script/sampling_zeroshot.py -o out.csv -p ./data/examples/6oim_protein.pdb -l ./data/examples/6oim_ligand.pdb --cuda
```
- smi file: Save molecules only (CPU: 0.06s/mol, GPU: 0.04s/mol)
```bash
python script/sampling_zeroshot.py -o out.smi -p ./data/examples/6oim_protein.pdb -c 1.872 -8.260 -1.361
```
### Custom optimization
If you want to train RxnFlow with your custom reward function, you can use the base classes from `gflownet.base`. The reward should be **Non-negative**.
- Example (QED)
```python
import torch
from gflownet.base import SynthesisTrainer, SynthesisGFNSampler, BaseTask
from gflownet.trainer import FlatRewards
from rdkit.Chem import Mol as RDMol, QED

class QEDTask(BaseTask):
def compute_flat_rewards(self, mols: list[RDMol], batch_idx: list[int]) -> tuple[FlatRewards, torch.Tensor]:
fr = torch.tensor([QED.qed(mol) for mol in mols], dtype=torch.float).reshape(-1, 1)
is_valid_t = torch.ones((len(mols),), dtype=torch.bool)
return FlatRewards(fr), is_valid_t
class QEDSynthesisTrainer(SynthesisTrainer): # For online training
def setup_task(self):
self.task: QEDTask = QEDTask(cfg=self.cfg, rng=self.rng, wrap_model=self._wrap_for_mp)
class QEDSynthesisSampler(SynthesisGFNSampler): # Sampling with pre-trained GFlowNet
def setup_task(self):
self.task: QEDTask = QEDTask(cfg=self.cfg, rng=self.rng, wrap_model=self._wrap_for_mp)
```
### Reproducing experimental results
All scripts to reproduce the results of paper are in `./experiments/`.
The dataset is available at [Google Drive](https://drive.google.com/drive/folders/1ZngDj3-b8ZLcR9J4ekIrGpxTklMXNIn-). Please decompress them at `./data/experiments/`.
## Citation
If you use this code in your research, please cite:
```
@article{seo2024rxnflow,
title={Generative Flows on Synthetic Pathway for Drug Design},
author={Seonghwan Seo and Minsu Kim and Tony Shen and Martin Ester and Jinkyoo Park and Sungsoo Ahn and Woo Youn Kim},
journal={arXiv preprint arXiv:2410.04542},
year={2024},
}
```
## Related Works
- [GFlowNet](https://arxiv.org/abs/2106.04399) (github: [recursionpharma/gflownet](https://github.com/recursionpharma/gflownet))
- [TacoGFN](https://arxiv.org/abs/2310.03223) [github: [tsa87/TacoGFN-SBDD](https://github.com/tsa87/TacoGFN-SBDD)]
- [PharmacoNet](https://arxiv.org/abs/2310.00681) [github: [SeonghwanSeo/PharmacoNet](https://github.com/SeonghwanSeo/PharmacoNet)]
- [UniDock](https://pubs.acs.org/doi/10.1021/acs.jctc.2c01145) [github: [dptech-corp/Uni-Dock](https://github.com/dptech-corp/Uni-Dock)]
5 changes: 0 additions & 5 deletions data/README.md

This file was deleted.

84 changes: 84 additions & 0 deletions data/examples/6oim_ligand.pdb
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
CRYST1 40.868 58.417 65.884 90.00 90.00 90.00 P 21 21 21 1
HETATM 1 C1 MOV A 303 1.642 -7.717 -1.656 1.00 25.39 C
HETATM 2 N1 MOV A 303 3.595 -9.086 -1.744 1.00 25.47 N
HETATM 3 O1 MOV A 303 -0.606 -11.017 -1.525 1.00 29.03 O
HETATM 4 C2 MOV A 303 2.258 -8.966 -1.680 1.00 25.81 C
HETATM 5 N2 MOV A 303 -0.516 -6.457 -1.522 1.00 25.26 N
HETATM 6 O2 MOV A 303 -2.563 -2.305 0.154 1.00 25.22 O
HETATM 7 C3 MOV A 303 4.408 -8.024 -1.795 1.00 26.04 C
HETATM 8 N3 MOV A 303 -0.485 -8.794 -1.541 1.00 26.50 N
HETATM 9 O3 MOV A 303 5.556 -9.536 -3.856 1.00 29.54 O
HETATM 10 C4 MOV A 303 3.854 -6.742 -1.782 1.00 25.72 C
HETATM 11 N4 MOV A 303 1.468 -10.112 -1.635 1.00 26.41 N
HETATM 12 C5 MOV A 303 2.467 -6.591 -1.708 1.00 25.83 C
HETATM 13 N5 MOV A 303 2.882 -13.267 -0.492 1.00 28.14 N
HETATM 14 C6 MOV A 303 5.789 -8.217 -1.865 1.00 26.76 C
HETATM 15 N6 MOV A 303 -2.128 -4.145 -1.011 1.00 25.51 N
HETATM 16 C7 MOV A 303 0.232 -7.655 -1.586 1.00 25.83 C
HETATM 17 C8 MOV A 303 0.088 -10.003 -1.564 1.00 27.12 C
HETATM 18 C9 MOV A 303 2.047 -11.378 -1.648 1.00 27.82 C
HETATM 19 C10 MOV A 303 2.320 -12.047 -0.454 1.00 27.79 C
HETATM 20 C11 MOV A 303 3.178 -13.859 -1.638 1.00 28.92 C
HETATM 21 C12 MOV A 303 2.924 -13.240 -2.857 1.00 28.92 C
HETATM 22 C13 MOV A 303 2.347 -11.976 -2.860 1.00 28.87 C
HETATM 23 C14 MOV A 303 2.006 -11.420 0.913 1.00 27.52 C
HETATM 24 C15 MOV A 303 3.303 -11.287 1.711 1.00 27.35 C
HETATM 25 C16 MOV A 303 0.986 -12.270 1.683 1.00 27.17 C
HETATM 26 C17 MOV A 303 0.121 -5.160 -1.149 1.00 25.83 C
HETATM 27 C18 MOV A 303 -0.822 -4.289 -0.337 1.00 25.87 C
HETATM 28 C19 MOV A 303 -2.541 -5.147 -2.020 1.00 25.62 C
HETATM 29 C20 MOV A 303 -1.985 -6.526 -1.667 1.00 25.42 C
HETATM 30 C21 MOV A 303 -2.610 -7.064 -0.371 1.00 25.93 C
HETATM 31 C22 MOV A 303 2.037 -11.248 -4.165 1.00 29.50 C
HETATM 32 C23 MOV A 303 -2.919 -3.127 -0.691 1.00 25.94 C
HETATM 33 C24 MOV A 303 -4.277 -2.959 -1.369 1.00 25.39 C
HETATM 34 C25 MOV A 303 -5.364 -2.168 -0.643 1.00 25.97 C
HETATM 35 C26 MOV A 303 6.343 -8.975 -2.901 1.00 27.88 C
HETATM 36 C27 MOV A 303 7.719 -9.162 -2.966 1.00 27.85 C
HETATM 37 C28 MOV A 303 8.550 -8.602 -2.005 1.00 27.87 C
HETATM 38 C29 MOV A 303 8.011 -7.850 -0.968 1.00 27.43 C
HETATM 39 C30 MOV A 303 6.635 -7.660 -0.895 1.00 26.79 C
HETATM 40 F1 MOV A 303 4.658 -5.663 -1.836 1.00 26.54 F
HETATM 41 F2 MOV A 303 6.144 -6.931 0.129 1.00 26.28 F
CONECT 1 12 4 16
CONECT 2 7 4
CONECT 3 17
CONECT 4 1 2 11
CONECT 5 16 26 29
CONECT 6 32
CONECT 7 2 14 10
CONECT 8 17 16
CONECT 9 35
CONECT 10 7 40 12
CONECT 11 4 18 17
CONECT 12 1 10
CONECT 13 19 20
CONECT 14 7 35 39
CONECT 15 27 32 28
CONECT 16 5 1 8
CONECT 17 8 3 11
CONECT 18 11 19 22
CONECT 19 13 18 23
CONECT 20 13 21
CONECT 21 20 22
CONECT 22 18 21 31
CONECT 23 19 24 25
CONECT 24 23
CONECT 25 23
CONECT 26 5 27
CONECT 27 15 26
CONECT 28 15 29
CONECT 29 5 28 30
CONECT 30 29
CONECT 31 22
CONECT 32 15 6 33
CONECT 33 32 34
CONECT 34 33
CONECT 35 14 9 36
CONECT 36 35 37
CONECT 37 36 38
CONECT 38 37 39
CONECT 39 14 38 41
CONECT 40 10
CONECT 41 39
END
Loading

0 comments on commit 38176b9

Please sign in to comment.