-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmutspecAnnot.xml
221 lines (136 loc) · 12.2 KB
/
mutspecAnnot.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
<tool id="mutSpecannot" name="MutSpec Annot" version="0.1" hidden="false">
<description>Annotate variants with ANNOVAR and other databases</description>
<requirements>
<requirement type="set_environment">SCRIPT_PATH</requirement>
<requirement type="package" version="5.18.1">perl</requirement>
</requirements>
<command interpreter="bash">
mutspecAnnot_wrapper.sh
$output
--refGenome ${refGenome}
--AVDB ${refGenome.fields.path}
--interval $interval
--fullAnnotation ${annotation_type}
$input
</command>
<inputs>
<param name="input" type="data" format="txt" label="Input file" help="Select a single file, multiple files or a dataset collection"/>
<param name="refGenome" type="select" label="Reference genome" help="Select the reference genome that was used for generating your data">
<options from_data_table="annovar_index" />
</param>
<param name="interval" type="text" value="10" label="Sequence context of variants" help="Number of retrieved bases that flank variants in 5' and 3'"/>
<param name="annotation_type" type="boolean" checked="true" truevalue="yes" falsevalue="no" label="Complete annotations" help="Select No if you have a file with millions of variants and you are just interested in having a quick overview of the mutational spectrum. Only the annotation from refGene, the strand orientation and the sequence context will be added." />
</inputs>
<outputs>
<data name="output" type="data" format="tabular" label="${input.name} annotated" />
</outputs>
<stdio>
<regex match="Missing flag !" source="stderr" level="fatal" description="You have forgotten to specify one or more arguments" />
<regex match="Error message:" source="stderr" level="fatal" description="Read error message for more details" />
<regex match="ANNOVAR LOG FILE" source="stderr" level="fatal" description="Read Annovar log file for more information" />
</stdio>
<help>
**What it does**
MutSpect-Annot provides functional annotations from `ANNOVAR software`__ (Feb 2016 version is provided here), as well as the strand transcript orientation (from refGene database) and sequence context of variants (extrated from the reference genome selected).
.. __: http://www.openbioinformatics.org/annovar/
.. class:: infomark
MutSpect-Annot works for human, mouse and rat genomes.
--------------------------------------------------------------------------------------------------------------------------------------------------
**Input formats**
MutSpect-Annot accepts files in VCF (version 4.1 and 4.2) or in tab-delimited (TAB) format.
.. class:: infomark
TIP: If your data is not TAB delimited, use *Text manipulation -> convert*
.. class:: warningmark
Filenames must be <= 31 characters.
.. class:: warningmark
Files should contain at least four columns describing for each variant: the chromosome number, the start genomic position, the reference allele and the alternate alleles. These columns can be in any order.
.. class:: warningmark
If multiple input files are specified they should be from the **same genome build** and in the **same format**.
.. class:: warningmark
The tool supports different column names (**names are case-sensitive**) depending on the source file as follows:
**mutect** : contig position ref_allele alt_allele
**vcf** : version `4.1`__ and `4.2`__
.. __: https://samtools.github.io/hts-specs/VCFv4.1.pdf
.. __: https://samtools.github.io/hts-specs/VCFv4.2.pdf
**cosmic** : Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic
**icgc** : chromosome chromosome_start reference_genome_allele mutated_to_allele
**tcga** : Chromosome Start_position Reference_Allele Tumor_Seq_Allele2
**ionTorrent** : chr Position Ref Alt
**proton** : Chrom Position Ref Variant
**varScan2** : Chrom Position Ref VarAllele
**varScan2 somatic** : chrom position ref var
**annovar** : Chr Start Ref Obs
**custom** : Chromosome Start Wild_Type Mutant
.. class:: infomark
For MuTect and MuTect2 output files, only confident calls are considered as other calls are very likely to be dubious calls or artefacts.
Variants containing the string REJECT in the judgement column or not passing MuTect2 filters are not annotated and excluded from the MutSpect-Annot output.
.. class:: infomark
For COSMIC and ICGC files, variants are reported on several transcripts. These duplicate variants need to be removed before annotating the file.
--------------------------------------------------------------------------------------------------------------------------------------------------
**Output**
The output is a tabular text file, that contains the retrieved annotations in the first columns and all columns from the original file at the end.
.. class:: infomark
Only classic chromosomes are considered for the annotation, all other chromosomes are excluded from MutSpec-Annot output.
For example for human genome only chr1 to chrY are annotated.
The following annotations are retrieved:
**ANNOVAR annotations**
An example of annotations retrieved by the tool (for the full list please visit the Galaxy pages `Annovar databases`__)
.. __: http://galaxy.iarc.fr/galaxy/u/ardinm/p/annovar-databases
Gene-based: RefSeqGene, UCSC Known Gene and Ensembl Gene
Region-based: localization of the variant on cytogenetic band (cytoBand), variant reported in Genome-Wide association studies (gwasCatalog) and variant mapped to segmental duplications (genomicSuperDups)
Filter-based:
- dbSNP: For human genome there is two versions available: the defaul version (snp) and a pre-filtered version (snpNonFlagged). In the pre-filtered version all SNPs ‹ 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, or flagged in dbSnp as clinically associated are removed from the full dbSNP database and therefore not present in this version.
- 1000 Genomes Project (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian))
- ESP: Exome Sequencing Project (ALL, AA (African American), EA (European American))
- ExAC: Exome Aggregation Consortium (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian))
- LJB26: SIFT, PolyPhen-2 (HDIV and HVAR)
**Transcript orientation**
The strand annotation corresponding to transcript orientation within genic regions is recovered from RefSeqGene database.
**Sequence context**
Flanking bases in both sides in 5' and 3' of the variant position retrieved from the reference genome used.
--------------------------------------------------------------------------------------------------------------------------------------------------
**Example**
Annotate the following file::
Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
chr7 121717919 121717920 - G
chr1 230846235 230846235 T A
chr14 33290999 33290999 A G
chr12 8082458 8082458 C T
chr4 70156391 70156391 T C
Will produce::
Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups snp138 1000g2014oct_all esp6500si_all Strand context Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
chr7 121717919 121717920 - G exonic AASS frameshift insertion AASS:NM_005763:exon23:c.2634dupC:p.A879fs NA rs147476318 NA NA - GCG chr7 121717919 121717920 - G
chr1 230846235 230846235 T A exonic AGT nonsynonymous SNV AGT:NM_000029:exon2:c.A362T:p.H121L NA NA NA NA - GTG chr1 230846235 230846235 T A
chr14 33290999 33290999 A G exonic AKAP6 nonsynonymous SNV AKAP6:NM_004274:exon13:c.A3980G:p.D1327G NA NA NA NA + GAC chr14 33290999 33290999 A G
chr12 8082458 8082458 C T exonic SLC2A3 nonsynonymous SNV SLC2A3:NM_006931:exon6:c.G683A:p.R228Q NA rs200481428 0.000199681 NA - CCG chr12 8082458 8082458 C T
chr4 70156391 70156391 T C exonic UGT2B28 nonsynonymous SNV UGT2B28:NM_053039:exon5:c.T1172C:p.V391A score=0.949699;Name=chr4:70035680 NA 0.000199681 NA + GTA chr4 70156391 70156391 T C
--------------------------------------------------------------------------------------------------------------------------------------------------
**Contact**
--------------------------------------------------------------------------------------------------------------------------------------------------
**Code**
The source code is available on `GitHub`__
.. __: https://github.com/IARCbioinfo/mutspec.git
</help>
<citations>
<citation type="bibtex">
@article{ardin_mutspec:_2016,
title = {{MutSpec}: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes},
volume = {17},
issn = {1471-2105},
doi = {10.1186/s12859-016-1011-z},
shorttitle = {{MutSpec}},
abstract = {{BACKGROUND}: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called {MutSpec}.
{RESULTS}: {MutSpec} includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the {COSMIC} database and other sources. {MutSpec} offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. {MutSpec} may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.
{CONCLUSIONS}: {MutSpec} offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. {MutSpec} can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.},
pages = {170},
number = {1},
journaltitle = {{BMC} Bioinformatics},
author = {Ardin, Maude and Cahais, Vincent and Castells, Xavier and Bouaoun, Liacine and Byrnes, Graham and Herceg, Zdenko and Zavadil, Jiri and Olivier, Magali},
date = {2016},
pmid = {27091472},
keywords = {Galaxy, Mutation signatures, Mutation spectra, Single base substitutions}
}
</citation>
</citations>
</tool>