Skip to content

shimo-lab/Drug-Gene-Analogy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drug-Gene-Analogy

Predicting drug–gene relations via analogy tasks with word embeddings
Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Yihua Zhu, Mariko Okada, Hidetoshi Shimodaira
Scientific Reports 15, 17240 (2025) [arXiv]

Fig. 2

Setup

The code is meant to run inside Docker. If you prefer other setups, install the packages listed in requirements.txt.

Build the Docker image

$ bash scripts/docker/build.sh

Run the Docker container

$ bash scripts/docker/run.sh

Embeddings

BioConceptVec

Download the BioConceptVec skip-gram embeddings:

$ mkdir -p data/embeddings
$ wget -c -O data/embeddings/concept_skipgram.json https://ftp.ncbi.nlm.nih.gov/pub/lu/BioConceptVec/concept_skip.json

Our skip-gram embeddings

Skip-gram embeddings trained on PubMed abstracts in 5-year windows from 1970 are available on Google Drive.

Generating the experimental dataset

See README.prepare.md for the full preprocessing pipeline.

Pre-generated data are already placed in the data/ directory.

Code

PCA

Fig. 2a: PCA of drugs and genes for a randomly selected relation

$ python Fig2a.py

Fig. 2b: PCA of drugs and genes classified to the ErbB signaling pathway

$ python Fig2b.py 

Evaluation

$ python eval_analogy.py
$ python eval_analogy_Y1.py
$ python eval_analogy_Y2.py
$ python eval_analogy_P1Y1_and_P2Y1.py
$ python eval_analogy_P1Y2_and_P2Y2.py

Results produced with the OpenAI API are stored under output/analogy_API/ and are loaded by default:

$ python eval_OpenAI_API.py

If you wish to rerun the API experiments, adjust the scripts as needed.

Plotting year-wise scores (Fig. 3 / Fig. S4)

$ python Fig3_FigS4.py

Predictions for the ErbB signaling pathway (Table 4)

$ python Table4.py

Analogy vs. TransE

Generate analogy-based predictions using 10 %, 20 %, … 60 % of the training data:

$ python eval_analogy_for_comparing_with_TransE.py

Compare them with TransE results:

$ python Fig4.py

(TransE scores are currently hard-coded inside Fig4.py; a dedicated script will be released later.)

Reference

  • Chen et al. Bioconceptvec: Creating and evaluating literature-based biomedical concept embeddings on a large scale. PLoS Comput Biol. (2020).

Appendix

Distribution of answer-set sizes:

$ python FigS2_S3.py

Search-result rank by answer-set size:

$ python FigS5.py

Weighted correlations are computed with the WeightedCorr repository.

Citation

@article{Yamagiwa2025,
  title        = {Predicting drug–gene relations via analogy tasks with word embeddings},
  author       = {Yamagiwa, Hiroaki and Hashimoto, Ryoma and Arakane, Kiwamu and Murakami, Ken and Soeda, Shou and Oyama, Momose and Zhu, Yihua and Okada, Mariko and Shimodaira, Hidetoshi},
  journal      = {Scientific Reports},
  volume       = {15},
  number       = {1},
  pages        = {17240},
  year         = {2025},
  month        = {May},
  doi          = {10.1038/s41598-025-01418-z},
  url          = {https://doi.org/10.1038/s41598-025-01418-z},
  issn         = {2045-2322}
}

Notes

  • Embedding URLs may change; please refer to the GitHub repository rather than the raw download link.
  • This directory was created by Hiroaki Yamagiwa.
  • Embeddings were trained by Ryoma Hashimoto.
  • KEGG ID to MeSH ID conversion (prepare_kegg2mesh.ipynb) was implemented by Kiwamu Arakane.
  • TransE prediction experiments were conducted by Yihua Zhu.

About

Predicting drug–gene relations via analogy tasks with word embeddings (Scientific Reports 15, 17240 (2025))

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published