Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/Storage/Databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@
Many research projects use reference or external databases.
This page describes databases that exist on Mahuika for use as well as recommendations for using some specific external databases.

## Maintained databases on Mahuika

Check warning on line 12 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check page meta

walk_toc

Header 'Maintained databases on Mahuika' is too long. Try to keep it under 32 characters to avoid word wrapping in the toc.

Some databases are readable for all users on Mahuika.
These databases can be found at `/opt/nesi/db`.

Check warning on line 15 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'nesi' is misspelled.
Some environmental modules depend on these databases and connect to these directories automatically.

Dataset | Path | Licence Status | Notes
-- | -- | -- | --
[AlphaFold](https://alphafold.ebi.ac.uk/) | /opt/nesi/db/alphafold_db | CC-BY-4.0 | Predicted protein structures generated by AlphaFold

Check warning on line 20 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'nesi' is misspelled.

Check warning on line 20 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'alphafold' is misspelled.
[BLAST](https://blast.ncbi.nlm.nih.gov/) | /opt/nesi/db/blast | Public | NCBI BLAST nucleotide and protein databases

Check warning on line 21 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'NCBI' is misspelled.

Check warning on line 21 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'nesi' is misspelled.
[cartopy](https://github.com/SciTools/cartopy) | /opt/nesi/db/cartopy | BSD-3-Clause | Databases for cartopy module

Check warning on line 22 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'cartopy' is misspelled.

Check warning on line 22 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'cartopy' is misspelled.

Check warning on line 22 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'nesi' is misspelled.

Check warning on line 22 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'cartopy' is misspelled.

Check warning on line 22 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Spelling

spelling

Word 'cartopy' is misspelled.
[centrifuge](https://ccb.jhu.edu/software/centrifuge/) | /opt/nesi/db/centrifuge | GPL-3.0 | Databases for centrifuge module
[CheckM2](https://github.com/chklovski/CheckM2) | /opt/nesi/db/CheckM2_DB | GPL-3.0 | Database for CheckM2 module
[CheckM](https://github.com/Ecogenomics/CheckM) | /opt/nesi/db/CheckM_DB | GPL-3.0 | Database for CheckM module
Expand All @@ -34,6 +34,7 @@
[gtdbtk_214](https://gtdb.ecogenomic.org/) | /opt/nesi/db/gtdbtk_214 | CC-BY-SA 4.0 | Genome Taxonomy Database release 214 used by GTDB-Tk module
[gtdbtk_220](https://gtdb.ecogenomic.org/) | /opt/nesi/db/gtdbtk_220 | CC-BY-SA 4.0 | Genome Taxonomy Database release 220 used by GTDB-Tk module
[HUMAnN](https://huttenhower.sph.harvard.edu/humann/) | /opt/nesi/db/Humann | MIT | Databases for HUMAnN module
[JGI IMG](https://img.jgi.doe.gov/) | /opt/nesi/db/JGI-IMG | Public | JGI IMG/M databases
[Kaiju](https://github.com/bioinformatics-centre/kaiju) | /opt/nesi/db/Kaiju | GPL-3.0 | Database index for Kaiju module
[Kraken2](https://github.com/DerrickWood/kraken2) | /opt/nesi/db/Kraken2 | MIT | Databases for Kraken2 module
[megaX](https://www.megasoftware.net/) | /opt/nesi/db/megaX | Free for academics | Evolutionary analysis reference data for MegaX module
Expand All @@ -51,6 +52,7 @@
[VEP](https://www.ensembl.org/info/docs/tools/vep/) | /opt/nesi/db/VariantEffectPredictor | No restrictions | Ensembl annotation data for variant effect prediction
[VIBRANT](https://github.com/AnantharamanLab/VIBRANT) | /opt/nesi/db/VIBRANT_v1.2.1_databases | GPL-3.0 | Viral genome and HMM databases used by VIBRANT environmental module
[VirSorter](https://github.com/simroux/VirSorter) | /opt/nesi/db/VirSorter | GPL-2.0 | Viral hallmark gene and profile databases
[VOGDB](https://vogdb.org/) | /opt/nesi/db/VOGDB | Public | The Virus Orthologous Groups Database (VOGDB) is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote homology.
[waafle](https://github.com/biobakery/waafle) | /opt/nesi/db/waafle | MIT | Reference sets for gene neighborhood analysis for waafle environmental module
[checkv](https://bitbucket.org/berkeleylab/checkv) | /opt/nesi/db/checkv-db-v0.6 | [BSD 3-Clause-style](https://bitbucket.org/berkeleylab/checkv/src/master/LICENSE.txt) | Viral genome completeness and contamination database for CheckV environmental module

Expand All @@ -63,13 +65,13 @@
!!! note "Requesting new or updated databases"
If there is a database you think may be useful to many Mahuika users, or if you would like an updated version of one of the maintained databases, please {% include "partials/support_request.html" %} with details about the source and version of the database of interest.

## Recommendations for obtaining data from selected external databases

Check warning on line 68 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check page meta

walk_toc

Header 'Recommendations for obtaining data from selected external databases' is too long. Try to keep it under 32 characters to avoid word wrapping in the toc.

### JGI Portals

The [Joint Genome Institute](https://jgi.doe.gov/) has many databases and data portals available.
To download/access files from JGI you will need to register for an account.
We recommend you utilize the [Globus endpoint provided by JGI](https://genome.jgi.doe.gov/portal/help/download.jsf#/globus) to directly transfer files from the JGI servers to Mahuika.

Check warning on line 74 in docs/Storage/Databases.md

View workflow job for this annotation

GitHub Actions / Check Prose

misc.greylist

'Use of 'utilize'. Do you know anyone who needs to utilize the word utilize?'
For more information about using Globus on Mahuika see [the Globus docs section](../Data_Transfer/Globus/Globus_Overview.md).

### NCBI
Expand Down
Loading