A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects

scPert, a multi-modal transformer framework that integrates large language model embeddings with structured biological knowledge to predict single-cell transcriptomic responses to genetic perturbations. Through hierarchical fusion of knowledge graph representations, contextual embeddings from foundation models, and gene-specific encodings, scPert achieves significant performance improvements in both single-gene and combinatorial perturbations over existing methods. In cancer-relevant applications, scPert demonstrates the capability to reveal p53 pathway dynamics and immune checkpoint regulatory mechanisms. Systematic evaluation on 42 cancer dependency genes demonstrates scPert's ability to identify critical potential therapeutic targets. Our framework establishes a powerful computational foundation for virtual cell construction and accelerates drug target discovery.

Installation

Install PyG, and then do pip install scpert

Requirement

anndata==0.9.2
scanpy==1.9.8
torch==2.3.0
torch-geometric==2.6.1
scvi-tools==0.20.3
pandas==2.0.3
numpy==1.24.4
scipy==1.10.1
cell-gears==0.0.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cudnn-cu12==8.9.2.26
flash_attn==0.2.8

Usage

embedding_dir: Directory containing gene embedding files (.npy)

data_path: Base directory for perturbation datasets

model_path: Pretrained model directory containing model.pt

pert_file:CSV file specifying perturbation pairs with columns: gene1,gene2

You can train scPert on your perturbation dataset simply running the Python script:

python ./scripts/train.py

You can use scPert to predict single gene or gene pairs perturbations by running the scripts:

python ./scripts/infer.py

Using API Interface:

from scpert import predict,train，ProcePertdata，scpert

pertData = ProcePertdata(data_path)
pertData.load(DataName = 'norman')

# training
train(data_path, DataName, model_save_path, embedding_path)

# predict
predict(model_path, data_path, pert_file_path, model_name='norman', device='auto',embedding_path, output_dir='./results')

Cite Us

This work is currently under peer review.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
embeddings		embeddings
img		img
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects

Installation

Requirement

Usage

Cite Us

About

Uh oh!

Releases

Packages

Languages

License

DearMingkun/scPert_gene

Folders and files

Latest commit

History

Repository files navigation

A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects

Installation

Requirement

Usage

Cite Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages