Hi! Thank you for visiting our repository. This is a Django project designed to facilitate the exploration of proteins at the domain level. We work with the DPCFam and DPCStruct datasets, which provide clusterings of protein sequences and protein structures, respectively, with the purpose of classifying protein domains at large scale.
| Dataset | Description | Zenodo |
|---|---|---|
| DPCFam | Sequence-based domain clusters | |
| DPCStruct | Structure-based domain clusters |
The project currently consists of two applications dpcfam and dpcstruct corresponding to the two datasets presented above. To reproduce the current state of this project (which is under development), please follow the steps below.
- 1. Prerequisites
- 2. Clone the Repository
- 3. Installation
- 4. Database Initialization
- 5. Migrations
- 6. Run the Server
- 7. Usage
- References
Our development environment uses:
- Ubuntu 24.04.3 LTS
- Python 3.12.3
- Visual Studio Code 1.109.3
- Git 2.43.0
- PostgreSQL 16.11
- CSV Data Files: Available upon request; place them in
static/dataframes/. - Static files: To enable the "Downloads" feature and serve domains data, organize the
static/directory as follows:
static/
├── downloads/
│ ├── dpcfam/
│ │ ├── alphafolddb_reps.zip
│ │ ├── dpcfamb_dataset.zip
│ │ ├── dpcfam_full_seeds.zip
│ │ ├── dpcfam_hmm_profiles.zip
│ │ └── dpcfam_msa_profiles.zip
│ └── dpcstruct/
│ ├── dpcstruct_reps_pdbs.tar.gz
│ └── dpcstruct_reps_seqs.tar.gz
└── production_files/
├── dpcfam/
│ ├── metaclusters_fasta/ # MCID.fasta files
│ ├── metaclusters_hmms/ # MCID.hmm files
│ └── metaclusters_msas_cdhit/ # MCID.msa files
└── dpcstruct/
├── dpcstruct_reps_seqs/ # MCID.fasta files (representatives only)
├── dpcstruct_reps_pdbs_zipped/ # MCID_pdb.zip files (representatives only)
└── dpcstruct_reps_pdbs/ # MCID.pdb files (representatives only)
If this is your first time, clone the project:
git clone https://github.com/emmanuelnyandukagarabi/dpc_fam_and_struct_webapp
cd dpc_fam_and_struct_webappOtherwise, pull the latest changes:
cd dpc_fam_and_struct_webapp
git pull-
Create (for first-time users) and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
Start the PostgreSQL service:
sudo service postgresql startUse the provided script to set up the PostgreSQL user and database:
sudo -u postgres psql -f static/scripts/create_a_user_and_a_database.sql-
Run the following script to create dpcfam tables and indexes :
PGPASSWORD="EmmaPSQL2026" psql -U enyanduk -h localhost -d dpcfam_mcs_db -f static/scripts/dpcfam/create_dpcfam_tables.sql -
Run the following script to populate dpcfam tables by loading data from CSV files (It will take a while; please wait until the process is completed!):
PGPASSWORD="EmmaPSQL2026" psql -U enyanduk -h localhost -d dpcfam_mcs_db -f static/scripts/dpcfam/populate_dpcfam_tables.sql
-
Run the following script to create dpcstruct tables and indexes:
PGPASSWORD="EmmaPSQL2026" psql -U enyanduk -h localhost -d dpcfam_mcs_db -f static/scripts/dpcstruct/create_dpcstruct_tables.sql -
Run the following script to populate dpcstruct tables by loading data from CSV files:
PGPASSWORD="EmmaPSQL2026" psql -U enyanduk -h localhost -d dpcfam_mcs_db -f static/scripts/dpcstruct/populate_dpcstruct_tables.sql
We have already created and pushed all migrations in this project. Optionally, you may run:
python3 manage.py makemigrations
python3 manage.py migratepython3 manage.py runserverVisit the following URL in your web browser (Chrome is my friend!):
http://127.0.0.1:8000/
Note: Congratulations, you made it! To stop the server, use Ctrl+C. To stop PostgreSQL, run sudo service postgresql stop. Once the database is successfully populated, you may delete the CSV files (static/dataframes/) to save space. For any feedback, reach out to us via any address on our profile. More features are coming soon!
If you use this project or the associated datasets, please cite:
-
Barone, F., Laio, A., Punta, M., Cozzini, S., Ansuini, A., & Cazzaniga, A. (2025). Unsupervised domain classification of AlphaFold2-predicted protein structures. PRX Life, 3(2), 023009. https://doi.org/10.1103/PRXLife.3.023009
-
Russo, E. T., Barone, F., Bateman, A., Cozzini, S., Punta, M., & Laio, A. (2022). DPCfam: Unsupervised protein family classification by density peak clustering of large sequence datasets. PLOS Computational Biology, 18(10), e1010610. https://doi.org/10.1371/journal.pcbi.1010610
BibTeX
@article{barone2025unsupervised,
title={Unsupervised domain classification of AlphaFold2-predicted protein structures},
author={Barone, Federico and Laio, Alessandro and Punta, Marco and Cozzini, Stefano and Ansuini, Alessio and Cazzaniga, Alberto},
journal={PRX Life},
volume={3},
number={2},
pages={023009},
year={2025},
publisher={APS}
}
@article{russo2022dpcfam,
title={Dpcfam: unsupervised protein family classification by density peak clustering of large sequence datasets},
author={Russo, Elena Tea and Barone, Federico and Bateman, Alex and Cozzini, Stefano and Punta, Marco and Laio, Alessandro},
journal={PLOS Computational Biology},
volume={18},
number={10},
pages={e1010610},
year={2022},
publisher={Public Library of Science San Francisco, CA USA}
}