-
Notifications
You must be signed in to change notification settings - Fork 33
Annotation From Protein Level
Protein level inputs are handled by the panno
subcommand.
To use uniprot id as protein name, one must first download the uniprot id map by
transvar config --download_idmap
Then one could use protein id instead of gene name by applying the --uniprot
option to TransVar. For example,
$ transvar panno --ccds -i 'Q5VUM1:47' --uniprot
Q5VUM1:47 CCDS4972 (protein_coding) C6ORF57 +
chr6:g.71289191_71289193/c.139_141/p.47S cds_in_exon_2
protein_sequence=S;cDNA_sequence=TCC;gDNA_sequence=TCC;source=CCDS
TransVar use a keyword extension ref
in Q5VUM1:p.47refS
to differentiate from the synonymous mutation Q5VUM1:p.47S
. The former notation specifies that the reference protein sequence is S
while the later specifies the target protein sequence is S
.
For example, one can find the genomic location of a DRY motif in protein P28222 by issuing the following command,
$ transvar panno -i 'P28222:p.146_148refDRY' --uniprot --ccds
P28222:p.146_148refDRY CCDS4986 (protein_coding) HTR1B -
chr6:g.78172677_78172685/c.436_444/p.D146_Y148 cds_in_exon_1
protein_sequence=DRY;cDNA_sequence=GACCGCTAC;gDNA_sequence=GTAGCGGTC;source=C
CDS
One can also use wildcard x
(lowercase) in the motif.
$ transvar panno -i 'HTR1B:p.365_369refNPxxY' --ccds
HTR1B:p.365_369refNPxxY CCDS4986 (protein_coding) HTR1B -
chr6:g.78172014_78172028/c.1093_1107/p.N365_Y369 cds_in_exon_1
protein_sequence=NPIIY;cDNA_sequence=AAC..TAT;gDNA_sequence=ATA..GTT;source=C
CDS
$ transvar panno --ccds -i 'ABCB11:p.200_400'
outputs
ABCB11:p.200_400 CCDS46444 (protein_coding) ABCB11 -
chr2:g.169833195_169851872/c.598_1200/p.T200_K400 cds_in_exons_[6,7,8,9,10,11]
protein_sequence=TRF..DRK;cDNA_sequence=ACA..AAA;gDNA_sequence=TTT..TGT;sourc
e=CCDS
Mutation formats acceptable in TransVar are PIK3CA:p.E545K
or without reference or alternative amino acid identity, e.g., PIK3CA:p.545K
or PIK3CA:p.E545
. TransVar takes native HGVS format inputs and outputs. The reference amino acid is used to narrow the search scope of candidate transcripts. The alternative amino acid is used to infer nucleotide change which results in the amino acid.
$ transvar panno -i PIK3CA:p.E545K --ensembl
outputs
PIK3CA:p.E545K ENST00000263967 (protein_coding) PIK3CA +
chr3:g.178936091G>A/c.1633G>A/p.E545K cds_in_exon_10
reference_codon=GAG;candidate_codons=AAG,AAA;candidate_mnv_variants=chr3:g.17
8936091_178936093delGAGinsAAA;dbsnp=rs104886003(chr3:178936091G>A);missense;a
liases=ENSP00000263967;source=Ensembl
One may encounter ambiguous cases where the multiple substitutions exist in explaining the amino acid change. For example,
$ transvar panno -i ACSL4:p.R133R --ccds
ACSL4:p.R133R CCDS14548 (protein_coding) ACSL4 -
chrX:g.108926078G>T/c.399C>A/p.R133R cds_in_exon_2
reference_codon=CGC;candidate_codons=AGG,AGA,CGA,CGG,CGT;candidate_snv_varian
ts=chrX:g.108926078G>C,chrX:g.108926078G>A;candidate_mnv_variants=chrX:g.1089
26078_108926080delGCGinsCCT,chrX:g.108926078_108926080delGCGinsTCT;synonymous
;source=CCDS
In those cases, TransVar prioritizes all the candidate base changes by minimizing the edit distance between the reference codon sequence and the target codon sequence. One of the optimal base changes is arbitrarily chosen as the default and all the candidates are included in the appended CddMuts
entry.
For example, one could annotate SNP with dbSNP id by downloading the dbSNP files. This can be done by
transvar config --download_dbsnp
TransVar automatically download dbSNP file which correspoding to the current default reference version (as set in transvar.cfg
). This also sets the entry in transvar.cfg
.
With dbSNP file downloaded, TransVar automatically looks for dbSNP id when performing annotation.
$ transvar panno -i 'A1CF:p.A309A' --ccds
A1CF:p.A309A CCDS7243 (protein_coding) A1CF -
chr10:g.52576004T>G/c.927A>C/p.A309A cds_in_exon_7
reference_codon=GCA;candidate_codons=GCC,GCG,GCT;candidate_snv_variants=chr10
:g.52576004T>C,chr10:g.52576004T>A;dbsnp=rs201831949(chr10:52576004T>G);synon
ymous;source=CCDS
Note that in order to use dbSNP, one must download the dbSNP database through transvar config --download_dbsnp
, or by configure the dbsnp
slot in the configure file via transvar config -k dbsnp -v [path to dbSNP VCF]
. Manually set path for dbSNP file must have the file tabix indexed.