AlphaDTA

AlphaDTA is a deep learning framework for protein-ligand binding affinity prediction that leverages AlphaFold3 embeddings and 3D complex structure information.

Environment Setup

1. Create Conda Environment

conda env create -f env.yml

2. Install DGL

Download the DGL wheel file from data.dgl.ai/wheels/repo.html:

# Download dgl-1.0.2+cu113-cp37-cp37m-manylinux1_x86_64.whl
pip install dgl-1.0.2+cu113-cp37-cp37m-manylinux1_x86_64.whl

3. Install PyTorch and Additional Dependencies

# Install PyTorch with CUDA support
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

# Install OpenBabel
conda install -c conda-forge openbabel=2.4.1

# Install other dependencies
pip install seaborn
conda install -c conda-forge biopython
conda install -c conda-forge pymol-open-source

4. Install Chimera

Install UCSF Chimera (v1.17.3) for structure visualization and processing.

Quick Start

Download Data

Run the data download script to fetch the preprocessed dataset:

python utils/download_data.py

This will create a data folder in the repository. The dataset is hosted on Hugging Face.

Pre-trained model checkpoints are stored in checkpoints/alphadta as .pth files.

Evaluation

LP-PDBbind

python protocols/lp_pdbbind/evaluate.py --model_path {model_path}

PDBbind CleanSplit (CASF2016)

python protocols/cleansplit/evaluate_casf2016.py \
    --csv_path data/csv/cleansplit/casf2016.csv \
    --graph_dir data/interaction_graph/test/crystal/casf2016_graph_ls \
    --embedding_dir data/af3_embedding/casf2016 \
    --model_dir checkpoints/alphadta/cleansplit

Training

LP-PDBbind

python protocols/lp_pdbbind/train.py \
    --config configs/alphadta.yaml \
    --lr 5e-4 \
    --seed 2 \
    --batch_size 32

Training results will be saved to output/lp_pdbbind.

PDBbind CleanSplit

python protocols/cleansplit/train.py \
    --config configs/alphadta.yaml \
    --csv_path data/csv/cleansplit/train-validation.csv \
    --split_dir protocols/cleansplit/cv_split \
    --graph_dir data/interaction_graph/train-valid/cleansplit_graph_ls \
    --embedding_dir data/af3_embedding/pdbcleansplit_only data/af3_embedding/shared \
    --lr 1e-4 \
    --batch_size 64 \
    --seed 2

Training results will be saved to output/cleansplit.

Data Preprocessing for New Complexes

To run AlphaDTA on new protein-ligand complexes, follow these preprocessing steps:

1. Run AlphaFold3

First, generate AlphaFold3 predictions for your protein-ligand complexes. Refer to the AlphaFold3 repository and the Supplementary Materials of the AlphaDTA paper for detailed setup instructions.

Important (AlphaFold3 input JSON order)
When creating AlphaFold3 input JSON files, make sure the sequences list is ordered as:
(1) protein sequence → (2) ligand smiles

Example

{
  "name": "abemaciclib",
  "modelSeeds": [42],
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQRSPLEKASVVSKLFFSWTR...",
        "templates": []
      }
    },
    {
      "ligand": {
        "id": "L",
        "smiles": "CCN1CCN(CC1)CC2=CN=C(NC3=NC=C(F)C(=N3)C4=CC(=C5N=C(C)[N](C(C)C)C5=C4)F)C=C2"
      }
    }
  ],
  "dialect": "alphafold3",
  "version": 2
}

2. Prepare Input Structure

Organize your AlphaFold3 input JSON files and output folders. For example:

Input JSON files:

CFTR/af_input/abemaciclib.json
CFTR/af_input/acebutolol_hcl.json

AlphaFold3 outputs (embeddings and CIF file):

CFTR/af_output/abemaciclib/
CFTR/af_output/acebutolol_hcl/

3. Preprocess Embeddings

python preprocess/preprocess_pt.py --dataset_root "CFTR"

This generates the processed_emb directory and a CSV file.

4. Preprocess Structures

python preprocess/preprocess_structure.py \
    --dataset_dir CFTR \
    --label_csv /path/to/labels.csv \
    --num_process 12 \
    --verbose

This generates:

processed_structure/graph_ls containing interaction graphs

You can now use the preprocessed embeddings and graphs as input to AlphaDTA.

Citation

If you use AlphaDTA in your research, please cite:

@article{abramson2024accurate,
  title={Accurate structure prediction of biomolecular interactions with AlphaFold 3},
  author={Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others},
  journal={Nature},
  volume={630},
  number={8016},
  pages={493--500},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

@article{wang2004pdbbind,
  title={The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures},
  author={Wang, Renxiao and Fang, Xueliang and Lu, Yipin and Wang, Shaomeng},
  journal={Journal of medicinal chemistry},
  volume={47},
  number={12},
  pages={2977--2980},
  year={2004},
  publisher={ACS Publications}
}

@article{li2026leak,
  title={Leak Proof PDBBind: A Reorganized Data Set of Protein--Ligand Complexes for More Generalizable Binding Affinity Prediction},
  author={Li, Jie and Guan, Xingyi and Zhang, Oufan and Sun, Kunyang and Wang, Yingze and Bagni, Dorian and Head-Gordon, Teresa},
  journal={The Journal of Physical Chemistry B},
  volume={130},
  number={2},
  pages={730--740},
  year={2026},
  publisher={ACS Publications}
}

@article{graber2025resolving,
  title={Resolving data bias improves generalization in binding affinity prediction},
  author={Graber, David and Stockinger, Peter and Meyer, Fabian and Mishra, Siddhartha and Horn, Claus and Buller, Rebecca},
  journal={Nature Machine Intelligence},
  pages={1--13},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaDTA

Environment Setup

1. Create Conda Environment

2. Install DGL

3. Install PyTorch and Additional Dependencies

4. Install Chimera

Quick Start

Download Data

Evaluation

LP-PDBbind

PDBbind CleanSplit (CASF2016)

Training

LP-PDBbind

PDBbind CleanSplit

Data Preprocessing for New Complexes

1. Run AlphaFold3

2. Prepare Input Structure

3. Preprocess Embeddings

4. Preprocess Structures

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
checkpoints/alphadta		checkpoints/alphadta
configs		configs
models		models
preprocess		preprocess
protocols		protocols
utils		utils
Overview.png		Overview.png
README.md		README.md
env.yml		env.yml

Folders and files

Latest commit

History

Repository files navigation

AlphaDTA

Environment Setup

1. Create Conda Environment

2. Install DGL

3. Install PyTorch and Additional Dependencies

4. Install Chimera

Quick Start

Download Data

Evaluation

LP-PDBbind

PDBbind CleanSplit (CASF2016)

Training

LP-PDBbind

PDBbind CleanSplit

Data Preprocessing for New Complexes

1. Run AlphaFold3

2. Prepare Input Structure

3. Preprocess Embeddings

4. Preprocess Structures

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages