-
Notifications
You must be signed in to change notification settings - Fork 33
Annotation From cDNA Level
Annotation from cDNA level is handled by the canno
subcommand.
TransVar infers nucleotide mutation through PIK3CA:c.1633G>A
. Note that nucleotide identity follows the natural sequence, i.e., if transcript is interpreted on the reverse-complementary strand, the base at the site needs to be reverse-complemented too.
$ transvar canno --ccds -i 'PIK3CA:c.1633G>A'
outputs
PIK3CA:c.1633G>A CCDS43171 (protein_coding) PIK3CA +
chr3:g.178936091G>A/c.1633G>A/p.E545K inside_[cds_in_exon_9]
CSQN=Missense;dbsnp=rs104886003(chr3:178936091G>A);reference_codon=GAG;altern
ative_codon=AAG;source=CCDS
The SNV can be in the intronic region, e.g.,
$ transvar canno --ccds -i 'ABCB11:c.1198-8C>A'
outputs
ABCB11:c.1198-8C>A CCDS46444 (protein_coding) ABCB11 -
chr2:g.169833205G>T/c.1198-8C>A/. inside_[intron_between_exon_10_and_11]
CSQN=IntronicSNV;source=CCDS
$ transvar canno --ccds -i 'ABCB11:c.1198-8_1202'
outputs
ABCB11:c.1198-8_1202 CCDS46444 (protein_coding) ABCB11 -
chr2:g.169833193_169833205GGTTTCTGGAGTG/c.1198-8_1202CACTCCAGAAACC/p.400_401KP from_[cds_in_exon_11]_to_[intron_between_exon_10_and_11]
C2=acceptor_splice_site_on_exon_11_at_chr2:169833198_included;source=CCDS
An insertion may result in: 1) a pure insertion of amino acids; 2) a block substitution of amino acids, when insertion occur after 1st or 2nd base in a codon; or 3) a frame-shift. Following HGVS nomenclature, TransVar labels the first different amino acid and the length of the peptide util stop codon, assuming no change in the splicing.
Example: to annotate an in-frame, in-phase insertion,
$ transvar canno --ccds -i 'ACIN1:c.1932_1933insATTCAC'
ACIN1:c.1932_1933insATTCAC CCDS9587 (protein_coding) ACIN1 -
chr14:g.23548785_23548786insGTGAAT/c.1932_1933insATTCAC/p.R644_S645insIH inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548785_23548786insGTGAAT;unalign_gD
NA=g.23548785_23548786insGTGAAT;left_align_cDNA=c.1932_1933insATTCAC;unalign_
cDNA=c.1932_1933insATTCAC;left_align_protein=p.R644_S645insIH;unalign_protein
=p.R644_S645insIH;phase=0;source=CCDS
ACIN1:c.1932_1933insATTCAC CCDS53889 (protein_coding) ACIN1 -
chr14:g.23548157_23548158insGTGAAT/c.1932_1933insATTCAC/p.P644_V645insIH inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548157_23548158insGTGAAT;unalign_gD
NA=g.23548157_23548158insGTGAAT;left_align_cDNA=c.1932_1933insATTCAC;unalign_
cDNA=c.1932_1933insATTCAC;left_align_protein=p.P644_V645insIH;unalign_protein
=p.P644_V645insIH;phase=0;source=CCDS
ACIN1:c.1932_1933insATTCAC CCDS55905 (protein_coding) ACIN1 -
chr14:g.23548785_23548786insGTGAAT/c.1932_1933insATTCAC/p.R644_S645insIH inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548785_23548786insGTGAAT;unalign_gD
NA=g.23548785_23548786insGTGAAT;left_align_cDNA=c.1932_1933insATTCAC;unalign_
cDNA=c.1932_1933insATTCAC;left_align_protein=p.R644_S645insIH;unalign_protein
=p.R644_S645insIH;phase=0;source=CCDS
Phase = 0,1,2
indicates whether the insertion happen after the 3rd, 1st or 2nd base of a codon, respectively. An insertion in phase refers to one with Phase=0
.
Example: to annotate an out-of-phase, in-frame insertion,
$ transvar canno --ccds -i 'ACIN1:c.1930_1931insATTCAC'
ACIN1:c.1930_1931insATTCAC CCDS9587 (protein_coding) ACIN1 -
chr14:g.23548792_23548793insTGTGAA/c.1930_1931insATTCAC/p.S643_R644insHS inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548787_23548788insGTGAAT;unalign_gD
NA=g.23548787_23548788insGTGAAT;left_align_cDNA=c.1925_1926insTTCACA;unalign_
cDNA=c.1930_1931insATTCAC;left_align_protein=p.R642_S643insSH;unalign_protein
=p.S643_R644insHS;phase=1;source=CCDS
ACIN1:c.1930_1931insATTCAC CCDS53889 (protein_coding) ACIN1 -
chr14:g.23548162_23548163insAATGTG/c.1930_1931insATTCAC/p.P643_P644insHS inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548159_23548160insGTGAAT;unalign_gD
NA=g.23548159_23548160insGTGAAT;left_align_cDNA=c.1927_1928insCACATT;unalign_
cDNA=c.1930_1931insATTCAC;left_align_protein=p.P643_P644insHS;unalign_protein
=p.P643_P644insHS;phase=1;source=CCDS
ACIN1:c.1930_1931insATTCAC CCDS55905 (protein_coding) ACIN1 -
chr14:g.23548792_23548793insTGTGAA/c.1930_1931insATTCAC/p.S643_R644insHS inside_[cds_in_exon_6]
CSQN=InFrameInsertion;left_align_gDNA=g.23548787_23548788insGTGAAT;unalign_gD
NA=g.23548787_23548788insGTGAAT;left_align_cDNA=c.1925_1926insTTCACA;unalign_
cDNA=c.1930_1931insATTCAC;left_align_protein=p.R642_S643insSH;unalign_protein
=p.S643_R644insHS;phase=1;source=CCDS
Reverse annotation can result in different identifiers after left/right alignments, e.g.,
$ transvar canno --ccds -i 'AATK:c.3976_3977insCGCCCA'
results in
AATK:c.3976_3977insCGCCCA CCDS45807 (protein_coding) AATK -
chr17:g.79093282_79093287dupTGGGCG/c.3988_3993dupACGCCC/p.T1330_P1331dupTP inside_[cds_in_exon_13]
CSQN=InFrameInsertion;left_align_gDNA=g.79093270_79093271insGGGCGT;unalign_gD
NA=g.79093282_79093287dupTGGGCG;left_align_cDNA=c.3976_3977insCGCCCA;unalign_
cDNA=c.3976_3977insCGCCCA;left_align_protein=p.A1326_P1327insPT;unalign_prote
in=p.A1326_P1327insPT;phase=1;source=CCDS
Note how insertion switch to duplication when 5'flanking is identical. This conforms to HGVS recommendation to replace insertion notation with duplication when possible.
Example: to annotate a frame-shift insertion, frameshift mutations have not alternative alignments. Hence only cDNA and gDNA have left alignment and unalignment reports.
$ transvar canno --ccds -i 'AAAS:c.1225_1226insG'
results in
AAAS:c.1225_1226insG CCDS8856 (protein_coding) AAAS -
chr12:g.53702093dupC/c.1225dupG/p.E409Gfs*17 inside_[cds_in_exon_13]
CSQN=Frameshift;left_align_gDNA=g.53702089_53702090insC;unalign_gDNA=g.537020
89_53702090insC;left_align_cDNA=c.1221_1222insG;unalign_cDNA=c.1225dupG;sourc
e=CCDS
AAAS:c.1225_1226insG CCDS53797 (protein_coding) AAAS -
chr12:g.53701842_53701843insC/c.1225_1226insG/p.L409Rfs*54 inside_[cds_in_exon_13]
CSQN=Frameshift;left_align_gDNA=g.53701842_53701843insC;unalign_gDNA=g.537018
42_53701843insC;left_align_cDNA=c.1225_1226insG;unalign_cDNA=c.1225_1226insG;
source=CCDS
Example: to annotate an intronic insertion,
$ transvar canno --ccds -i 'ADAM33:c.991-3_991-2insC'
outputs
ADAM33:c.991-3_991-2insC CCDS13058 (protein_coding) ADAM33 -
chr20:g.3654151dupG/c.991-3dupC/. inside_[intron_between_exon_10_and_11]
CSQN=IntronicInsertion;left_align_gDNA=g.3654145_3654146insG;unalign_gDNA=g.3
654145_3654146insG;left_align_cDNA=c.991-9_991-8insC;unalign_cDNA=c.991-3dupC
;source=CCDS
In the case of intronic insertions, amino acid identifier is not applicable, represented in a .
. But cDNA and gDNA identifier are right-aligned according to their natural order, respecting HGVS nomenclature.
Insertion could occur to splice sites. TransVar identifies such cases and report splice site and repress translation of protein change.
$ transvar canno --ccds -i 'ADAM33:c.991_992insC'
results in
ADAM33:c.991_992insC CCDS13058 (protein_coding) ADAM33 -
chr20:g.3654142_3654143insG/c.991_992insC/. inside_[cds_in_exon_11]
CSQN=SpliceAcceptorInsertion;left_align_gDNA=g.3654142_3654143insG;unalign_gD
NA=g.3654142_3654143insG;left_align_cDNA=c.991_992insC;unalign_cDNA=c.991_992
insC;C2=acceptor_splice_site_on_exon_11_at_chr20:3654144_affected;source=CCDS
Similar to insertions, deletion can be in-frame or frame-shift. The consequence of deletion to amino acid sequence may appear a simple deletion or a block substitution (in the case where in-frame deletion is out of phase, i.e., partially delete codons).
Example: to annotate an in-frame deletion,
$ transvar canno --ccds -i 'A4GNT:c.694_696delTTG'
A4GNT:c.694_696delTTG CCDS3097 (protein_coding) A4GNT -
chr3:g.137843435_137843437delACA/c.694_696delTTG/p.L232delL inside_[cds_in_exon_2]
CSQN=InFrameDeletion;left_align_gDNA=g.137843433_137843435delCAA;unaligned_gD
NA=g.137843433_137843435delCAA;left_align_cDNA=c.692_694delTGT;unalign_cDNA=c
.694_696delTTG;left_align_protein=p.L232delL;unalign_protein=p.L232delL;sourc
e=CCDS
Example: to annotate a in-frame, out-of-phase deletion,
$ transvar canno --ccds -i 'ABHD15:c.431_433delGTG'
ABHD15:c.431_433delGTG CCDS32602 (protein_coding) ABHD15 -
chr17:g.27893552_27893554delCAC/c.431_433delGTG/p.C144_V145delinsF inside_[cds_in_exon_1]
CSQN=MultiAAMissense;left_align_gDNA=g.27893552_27893554delCAC;unaligned_gDNA
=g.27893552_27893554delCAC;left_align_cDNA=c.431_433delGTG;unalign_cDNA=c.431
_433delGTG;source=CCDS
Example: to annotate a frame-shift deletion,
$ transvar canno --ccds -i 'AADACL3:c.374delG'
AADACL3:c.374delG CCDS41252 (protein_coding) AADACL3 +
chr1:g.12785494delG/c.374delG/p.C125Ffs*17 inside_[cds_in_exon_3]
CSQN=Frameshift;left_align_gDNA=g.12785494delG;unaligned_gDNA=g.12785494delG;
left_align_cDNA=c.374delG;unalign_cDNA=c.374delG;source=CCDS
Example: to annotate a deletion that span from intronic to coding region, protein prediction is suppressed due to loss of splice site.
$ transvar canno --ccds -i 'ABCB11:c.1198-8_1199delcactccagAA'
ABCB11:c.1198-8_1199delcactccagAA CCDS46444 (protein_coding) ABCB11 -
chr2:g.169833196_169833205delTTCTGGAGTG/c.1198-8_1199delCACTCCAGAA/. from_[cds_in_exon_11]_to_[intron_between_exon_10_and_11]
CSQN=SpliceAcceptorDeletion;left_align_gDNA=g.169833196_169833205delTTCTGGAGT
G;unaligned_gDNA=g.169833196_169833205delTTCTGGAGTG;left_align_cDNA=c.1198-8_
1199delCACTCCAGAA;unalign_cDNA=c.1198-8_1199delCACTCCAGAA;C2=acceptor_splice_
site_on_exon_11_at_chr2:169833198_lost;source=CCDS
Example: to annotate a block substitution in coding region,
$ transvar canno --ccds -i 'A1CF:c.508_509delinsTT'
A1CF:c.508_509delinsTT CCDS7241 (protein_coding) A1CF -
chr10:g.52595929_52595930delinsAA/c.508_509delinsTT/p.P170L inside_[cds_in_exon_4]
CSQN=Missense;codon_cDNA=508-509-510;source=CCDS
A1CF:c.508_509delinsTT CCDS7242 (protein_coding) A1CF -
chr10:g.52595929_52595930delinsAA/c.508_509delinsTT/p.P170L inside_[cds_in_exon_4]
CSQN=Missense;codon_cDNA=508-509-510;source=CCDS
A1CF:c.508_509delinsTT CCDS7243 (protein_coding) A1CF -
chr10:g.52595953_52595954delinsAA/c.508_509delinsTT/p.G170F inside_[cds_in_exon_4]
CSQN=Missense;codon_cDNA=508-509-510;source=CCDS
Block substitution does not necessarily results in block substitution in amino acid. For example, the following substitution results in a deletion, where protein alternative alignment should be reported.
$ transvar canno --ccds -i 'CSRNP1:c.1212_1224delinsGGAGGAGGAA'
CSRNP1:c.1212_1224delinsGGAGGAGGAA CCDS2682 (protein_coding) CSRNP1 -
chr3:g.39185092_39185104delinsTTCCTCCTCC/c.1212_1224delinsGGAGGAGGAA/p.E411delE inside_[cds_in_exon_4]
CSQN=InFrameDeletion;begin_codon_cDNA=1210-1211-1212;end_codon_cDNA=1222-1223
-1224;left_align_protein=p.E405delE;unalign_protein=p.E408delE;source=CCDS
Likewise, block substitution could occur to intronic region,
$ transvar canno --ccds -i 'A1CF:c.1460+2_1460+3delinsCC'
A1CF:c.1460+2_1460+3delinsCC CCDS7241 (protein_coding) A1CF -
chr10:g.52570797_52570798delinsGG/c.1460+2_1460+3delinsCC/. inside_[intron_between_exon_9_and_10]
CSQN=IntronicBlockSubstitution;source=CCDS
When block substitution occurs across splice site, TransVar put a tag in the info fields and does not predict amino acid change.
$ transvar canno --ccds -i 'A1CF:c.1459_1460+3delinsCC'
A1CF:c.1459_1460+3delinsCC CCDS7241 (protein_coding) A1CF -
chr10:g.52570797_52570801delinsGG/c.1459_1460+3delinsCC/. from_[intron_between_exon_9_and_10]_to_[cds_in_exon_9]
CSQN=SpliceDonorSubstitution;C2=donor_splice_site_on_exon_9_at_chr10:52570799
_lost;source=CCDS
Duplication can be thought of as special insertion where the inserted sequence is identical to the sequence flanking the breakpoint. Similar to insertion, the annotation of duplication may possess alternative alignment.
Example: to annotate a duplication coding region,
$ transvar canno --ccds -i 'CHD7:c.1669_1674dup'
CHD7:c.1669_1674dup CCDS47865 (protein_coding) CHD7 +
chr8:g.61693564_61693569dupCCCGTC/c.1669_1674dup/p.P558_S559dupPS inside_[cds_in_exon_2]
CSQN=InFrameInsertion;left_align_gDNA=g.61693561_61693562insTCCCCG;unalign_gD
NA=g.61693562_61693567dupTCCCCG;left_align_cDNA=c.1668_1669insTCCCCG;unalign_
cDNA=c.1669_1674dupTCCCCG;left_align_protein=p.H556_S557insSP;unalign_protein
=p.S557_P558dupSP;phase=0;source=CCDS
Example: a duplication on the nucleotide level may lead to frame-shift or block substitution on the amino acid level,
$ transvar canno --ccds -i 'CHD7:c.1668_1669dup'
CHD7:c.1668_1669dup CCDS47865 (protein_coding) CHD7 +
chr8:g.61693561_61693562dupTT/c.1668_1669dup/p.S557Ffs*8 inside_[cds_in_exon_2]
CSQN=Frameshift;left_align_gDNA=g.61693560_61693561insTT;unalign_gDNA=g.61693
561_61693562dupTT;left_align_cDNA=c.1667_1668insTT;unalign_cDNA=c.1668_1669du
pTT;source=CCDS
Example: to annotate a duplication in intronic region,
$ transvar canno --ccds -i 'CHD7:c.1666-5_1666-3dup'
CHD7:c.1666-5_1666-3dup CCDS47865 (protein_coding) CHD7 +
chr8:g.61693554_61693556dupCTC/c.1666-5_1666-3dup/. inside_[intron_between_exon_1_and_2]
CSQN=IntronicInsertion;left_align_gDNA=g.61693553_61693554insCTC;unalign_gDNA
=g.61693554_61693556dupCTC;left_align_cDNA=c.1666-6_1666-5insCTC;unalign_cDNA
=c.1666-5_1666-3dupCTC;source=CCDS