KAPT

Automated inference of the Kappaphycus alvarezii supertranscriptome through de novo transcriptome assembly and comprehensive functional annotation.

prerequisites

KAPT.nf requires Nextflow 23.04.4 or higher.

Follow this tutorial to install Nextflow.

We use Conda to avoid dependency conflicts and ensure reproducible environments across different systems.
Make sure you have installed Conda 4.12.0 or higher.

Follow this tutorial to install Miniconda, a miniature version of Anaconda.

installing and configuring

Clone this repository to install KAPT.nf:

git clone https://github.com/felipevzps/KAPT.git
cd KAPT

KAPT.nf requires an input sample in CSV format containing the following information:

run: SRA Accession ID of RNA-seq data
sample_name: Optional but recommended for readability

See an example below:

cat samples/PRJNA971596_BHL.csv

run,sample_name
SRR24507659,Brown phenotype - High Light (BHL)
SRR24507661,Brown phenotype - High Light (BHL)
SRR24507678,Brown phenotype - High Light (BHL)
SRR24507679,Brown phenotype - High Light (BHL)

Tip

This is an example of publicly available RNA-seq data from Kappaphycus alvarezii (brown phenotype) grown under high light conditions (910 mmol/m-2S-1 light).

All samples used in this study are available in the KAPT/samples/ directory.

database installation

KAPT.nf uses eggNOG-mapper for fast functional annotation of protein sequences.
This approach provides comprehensive orthology-based annotation with COG, KEGG, and GO terms.

Below is an example for downloading the eggNOG-mapper database:

eggnog_db_path="/path/to/eggnog_database/"
download_eggnog_data.py --data_dir $eggnog_db_path -y

Follow this tutorial for complete installation of the eggNOG-mapper database.

Since we are working with transcriptome assembly of red algae, we need the gene content of near-universal single-copy orthologs of Rhodophyta to assess assembly quality with BUSCO.

Below is an example for downloading the Rhodophyta BUSCO database:

busco_db_path="/path/to/busco_downloads/"
busco --datasets_version odb12 --download rhodophyta_odb12 --download_path $busco_db_path

After downloading the databases, make sure to update the configuration file with your database paths.

See an example below:

head config/nextflow.config
// config file for defining module parameters

params {
    // paths to databases 
    eggnog_db_path = "/path/to/eggnog_database"
    busco_db_path = "/path/to/busco_downloads/"

Tip

This is the configuration file used in this study.

running the pipeline

From the pipeline root directory, execute:

cd KAPT

# example for running the pipeline inside a SGE HPC cluster 
nextflow run workflows/KAPT.nf -c config/nextflow.config -profile sge --samples_csv samples/PRJNA971596_BHL.csv --output_dir "../results/PRJNA971596_BHL" --report_dir "report/PRJNA971596_BHL"

# you can use the -resume flag to re-run the pipeline if some step failed
nextflow run workflows/KAPT.nf -c config/nextflow.config -resume -profile sge --samples_csv samples/PRJNA971596_BHL.csv --output_dir "../results/PRJNA971596_BHL" --report_dir "report/PRJNA971596_BHL"

questions?

For suggestions, bug reports, or collaboration, feel free to open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
bin		bin
config		config
envs		envs
modules		modules
samples		samples
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KAPT

prerequisites

installing and configuring

database installation

running the pipeline

questions?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KAPT

prerequisites

installing and configuring

database installation

running the pipeline

questions?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages