Name	Name	Last commit message	Last commit date
parent directory ..
results	results
6m0j.pdb	6m0j.pdb
6vyb.pdb	6vyb.pdb
BloomLab_rbd.md	BloomLab_rbd.md
BloomLab_spike.md	BloomLab_spike.md
README.md	README.md
Snakefile	Snakefile
Spikes.fasta	Spikes.fasta
config.yml	config.yml
env.yml	env.yml

Name

Last commit message

Last commit date

Natural amino-acid frequencies in Spike

The raw sequences contains Spike amino acid sequences from various SARS-related coronaviruses. This fasta file gives the common name and accession number (NCBI or GISAID) for each isolate's genome. The current sequence set includes:

The curated set of 30 sequences from Letko, Marzi, and Munster (2020), which contained all known unique RBD sequences at the time of its publication. See Extended Data Fig 1 for more information.
RaTG13 and RmYN02, two newly described bat CoV sequences that are the most closely related strains currently known to SARS-CoV-2.
Two recent pangolin CoV sequences from Lam et al. (2020), including the infectious virus isolated from a pangolin seized in Guanxi, and the consensus sequence from two closely related isolates from pangolins seized in Guangdong.

We are using the pdb structures 6vyb.pdb for full trimer, and 6m0j.pdb for ACE2-bound RBD.

The Snakefile reads in the protein sequences and the protein structure and generates the datafiles results/BloomLab_spike.csv and results/BloomLab_rbd.csv, along with alignments which are saved in ./results. To run the snakemake pipeline, you can use the conda environment SpikeVariation. To build the environment run conda env create -f env.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Natural amino-acid frequencies in Spike

FilesExpand file tree

BloomLab2020

Directory actions

More options

Directory actions

More options

Latest commit

History

BloomLab2020

Folders and files

parent directory

README.md

Natural amino-acid frequencies in Spike