Drug-Gene-Analogy

Predicting drug–gene relations via analogy tasks with word embeddings
Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Yihua Zhu, Mariko Okada, Hidetoshi Shimodaira
Scientific Reports 15, 17240 (2025) [arXiv]

Setup

The code is meant to run inside Docker. If you prefer other setups, install the packages listed in requirements.txt.

Build the Docker image

$ bash scripts/docker/build.sh

Run the Docker container

$ bash scripts/docker/run.sh

Embeddings

BioConceptVec

Download the BioConceptVec skip-gram embeddings:

$ mkdir -p data/embeddings
$ wget -c -O data/embeddings/concept_skipgram.json https://ftp.ncbi.nlm.nih.gov/pub/lu/BioConceptVec/concept_skip.json

Our skip-gram embeddings

Skip-gram embeddings trained on PubMed abstracts in 5-year windows from 1970 are available on Google Drive.

Generating the experimental dataset

See README.prepare.md for the full preprocessing pipeline.

Pre-generated data are already placed in the data/ directory.

Code

PCA

Fig. 2a: PCA of drugs and genes for a randomly selected relation

$ python Fig2a.py

Fig. 2b: PCA of drugs and genes classified to the ErbB signaling pathway

$ python Fig2b.py

Evaluation

$ python eval_analogy.py
$ python eval_analogy_Y1.py
$ python eval_analogy_Y2.py
$ python eval_analogy_P1Y1_and_P2Y1.py
$ python eval_analogy_P1Y2_and_P2Y2.py

Results produced with the OpenAI API are stored under output/analogy_API/ and are loaded by default:

$ python eval_OpenAI_API.py

If you wish to rerun the API experiments, adjust the scripts as needed.

Plotting year-wise scores (Fig. 3 / Fig. S4)

$ python Fig3_FigS4.py

Predictions for the ErbB signaling pathway (Table 4)

$ python Table4.py

Analogy vs. TransE

Generate analogy-based predictions using 10 %, 20 %, … 60 % of the training data:

$ python eval_analogy_for_comparing_with_TransE.py

Compare them with TransE results:

$ python Fig4.py

(TransE scores are currently hard-coded inside Fig4.py; a dedicated script will be released later.)

Reference

Chen et al. Bioconceptvec: Creating and evaluating literature-based biomedical concept embeddings on a large scale. PLoS Comput Biol. (2020).

Appendix

Distribution of answer-set sizes:

$ python FigS2_S3.py

Search-result rank by answer-set size:

$ python FigS5.py

Weighted correlations are computed with the WeightedCorr repository.

Citation

@article{Yamagiwa2025,
  title        = {Predicting drug–gene relations via analogy tasks with word embeddings},
  author       = {Yamagiwa, Hiroaki and Hashimoto, Ryoma and Arakane, Kiwamu and Murakami, Ken and Soeda, Shou and Oyama, Momose and Zhu, Yihua and Okada, Mariko and Shimodaira, Hidetoshi},
  journal      = {Scientific Reports},
  volume       = {15},
  number       = {1},
  pages        = {17240},
  year         = {2025},
  month        = {May},
  doi          = {10.1038/s41598-025-01418-z},
  url          = {https://doi.org/10.1038/s41598-025-01418-z},
  issn         = {2045-2322}
}

Notes

Embedding URLs may change; please refer to the GitHub repository rather than the raw download link.
This directory was created by Hiroaki Yamagiwa.
Embeddings were trained by Ryoma Hashimoto.
KEGG ID to MeSH ID conversion (prepare_kegg2mesh.ipynb) was implemented by Kiwamu Arakane．
TransE prediction experiments were conducted by Yihua Zhu．

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drug-Gene-Analogy

Setup

Build the Docker image

Run the Docker container

Embeddings

BioConceptVec

Our skip-gram embeddings

Generating the experimental dataset

Code

PCA

Evaluation

Plotting year-wise scores (Fig. 3 / Fig. S4)

Predictions for the ErbB signaling pathway (Table 4)

Analogy vs. TransE

Reference

Appendix

Citation

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/images		.github/images
data		data
output		output
scripts/docker		scripts/docker
.gitignore		.gitignore
Dockerfile		Dockerfile
Fig2a.py		Fig2a.py
Fig2b.py		Fig2b.py
Fig3_FigS4.py		Fig3_FigS4.py
Fig4.py		Fig4.py
FigS2_S3.py		FigS2_S3.py
FigS5.py		FigS5.py
README.md		README.md
README.prepare.md		README.prepare.md
Table4.py		Table4.py
WeightedCorr.py		WeightedCorr.py
convert_bcv_json2pkl.py		convert_bcv_json2pkl.py
eval_OpenAI_API.py		eval_OpenAI_API.py
eval_analogy.py		eval_analogy.py
eval_analogy_P1Y1_and_P2Y1.py		eval_analogy_P1Y1_and_P2Y1.py
eval_analogy_P1Y2_and_P2Y2.py		eval_analogy_P1Y2_and_P2Y2.py
eval_analogy_Y1.py		eval_analogy_Y1.py
eval_analogy_Y2.py		eval_analogy_Y2.py
eval_analogy_for_comparing_with_TransE.py		eval_analogy_for_comparing_with_TransE.py
prepare_gene_cpt2name.py		prepare_gene_cpt2name.py
prepare_kegg2mesh.ipynb		prepare_kegg2mesh.ipynb
prepare_kge_relation.py		prepare_kge_relation.py
prepare_pathway_query.py		prepare_pathway_query.py
prepare_pathway_set.py		prepare_pathway_set.py
prepare_pathway_set_and_query_for_P1Y1_and_P2Y1.py		prepare_pathway_set_and_query_for_P1Y1_and_P2Y1.py
prepare_pathway_set_and_query_for_P1Y2_and_P2Y2.py		prepare_pathway_set_and_query_for_P1Y2_and_P2Y2.py
prepare_pathway_txt2json.py		prepare_pathway_txt2json.py
prepare_relation.py		prepare_relation.py
prepare_relation_for_Y2.py		prepare_relation_for_Y2.py
requirements.txt		requirements.txt
utils.py		utils.py

shimo-lab/Drug-Gene-Analogy

Folders and files

Latest commit

History

Repository files navigation

Drug-Gene-Analogy

Setup

Build the Docker image

Run the Docker container

Embeddings

BioConceptVec

Our skip-gram embeddings

Generating the experimental dataset

Code

PCA

Evaluation

Plotting year-wise scores (Fig. 3 / Fig. S4)

Predictions for the ErbB signaling pathway (Table 4)

Analogy vs. TransE

Reference

Appendix

Citation

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages