This repository supports the research paper on the Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool, referred to as BAKIR. It includes scripts and workflows to reproduce the data analysis and benchmarking results presented in the paper. The actual BAKIR tool can be found at BAKIR repository.
Before cloning the repository, ensure you have the following prerequisites installed:
To clone the repository and all submodules, use the following command:
git clone --recursive git@github.com:michael-ford/bakir-paper-methods.git
To run the analysis and reproduce the results, execute the run_analysis.sh script. This script will:
- Set up the required Conda environments.
- Run a series of Nextflow workflows to fetch data.
- Execute BAKIR, Immunannot, and Skirt annotations and analysis.
./run_analysis.sh
The following output files and directories are generated by the script:
- Notebooks:
- figures-data-generation.ipynb: Jupyter notebook for generating figures and additional data insights.
- Images:
- gene_allele_comparison.png: Visualization of gene allele comparisons.
- Data Files:
- all_discordant_alleles.csv: CSV file containing all discordant allele comparisons.
- Annotation Directories:
- HPRC-assemblies: Directory containing all raw assembly files.
- HPRC-assemblies-annotations: Contains BAKIR annotations.
- HPRC-Immunannot-annotations: Contains Immunannot annotations.
- HPRC-Skirt-annotations: Contains Skirt annotations.