-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmutspecSplit.xml
122 lines (79 loc) · 9.24 KB
/
mutspecSplit.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
<tool id="mutSpecsplit" name="MutSpec Split" version="0.1" hidden="false" force_history_refresh="True">
<description>Split a tabular file by sample ID</description>
<requirements>
<requirement type="set_environment">SCRIPT_PATH</requirement>
<requirement type="package" version="5.18.1">perl</requirement>
</requirements>
<command interpreter="perl">
mutspecSplit.pl -f $input -c $column
</command>
<inputs>
<param name="input" type="data" format="tabular" label="Input file" help="If using the batch mode (multiple datasets), all files must contain the same sample id column. The tool doesn't support dataset list as input !" />
<param name="column" type="data_column" data_ref="input" label="Split by" use_header_names="true"/>
</inputs>
<outputs>
<collection name="splitted_output" type="list" label="collection">
<discover_datasets pattern="__name__" ext="tabular" directory="outputFiles"/>
</collection>
</outputs>
<help>
**What it does**
This tool splits a file into several files based on the content of the selected column.
It can be used for example to split a file that contains data on 10 samples into 10 files using the same sample ID column.
The resulting files are saved into a dataset list/collection.
--------------------------------------------------------------------------------------------------------------------------------------------------
**Input**
One or multiple tab delimited text files.
If multiple files are selected, they should all have the same column on which you want to do the split.
.. class:: warningmark
The tool doesn't support dataset list as input !!!
--------------------------------------------------------------------------------------------------------------------------------------------------
**Output**
A dataset list containing tab delimited text files resulting from splitting the input file(s).
.. class:: warningmark
If a large number of file are generated, you'll need to refresh the history to see all files included in the dataset list. The entire list of file may still not be correctly displayed due to a known bug in Galaxy that may be fixed in future versions.
--------------------------------------------------------------------------------------------------------------------------------------------------
**Example**
Split by sample ID the following file::
Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups 1000g2012apr_all snp137 esp6500si_all cosmic67 Strand Context Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic Sample_name Pubmed_PMID Age Comments
chr12 82752552 82752552 G A exonic METTL25 nonsynonymous SNV NM_032230:c.G208A:p.E70K NA NA NA NA NA + GTCGGAGACGGAGGCCCTGCC chr12 82752552 G A APA29 23913001 2 NA
chr11 86663436 86663436 C A exonic FZD4 nonsynonymous SNV NM_012193:c.G362T:p.C121F NA NA NA NA NA - GACTGAAAGACACATGCCGCC chr11 86663436 C A APA12 21311022 34 Tissue Remark Fixed:Remark
chr12 57872994 57872994 G A exonic ARHGAP9 nonsynonymous SNV NM_001080157:c.C196T:p.R66C NA NA NA 0.000077 ID=COSM431582;OCCURENCE=2(breast) - GCTTCTAGGCGTCTTGCCAAC chr12 57872994 G A APA12 21311022 34 Tissue Remark Fixed:Remark
Will create a dataset list with two dataset:
APA29::
Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups 1000g2012apr_all snp137 esp6500si_all cosmic67 Strand Context Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic Sample_name Pubmed_PMID Age Comments
chr12 82752552 82752552 G A exonic METTL25 nonsynonymous SNV NM_032230:c.G208A:p.E70K NA NA NA NA NA + GTCGGAGACGGAGGCCCTGCC chr12 82752552 G A APA29 23913001 2 NA
APA12::
Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups 1000g2012apr_all snp137 esp6500si_all cosmic67 Strand Context Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic Sample_name Pubmed_PMID Age Comments
chr11 86663436 86663436 C A exonic FZD4 nonsynonymous SNV NM_012193:c.G362T:p.C121F NA NA NA NA NA - GACTGAAAGACACATGCCGCC chr11 86663436 C A APA12 21311022 34 Tissue Remark Fixed:Remark
chr12 57872994 57872994 G A exonic ARHGAP9 nonsynonymous SNV NM_001080157:c.C196T:p.R66C NA NA NA 0.000077 ID=COSM431582;OCCURENCE=2(breast) - GCTTCTAGGCGTCTTGCCAAC chr12 57872994 G A APA12 21311022 34 Tissue Remark Fixed:Remark
--------------------------------------------------------------------------------------------------------------------------------------------------
**Contact**
--------------------------------------------------------------------------------------------------------------------------------------------------
**Code**
The source code is available on `GitHub`__
.. __: https://github.com/IARCbioinfo/mutspec.git
</help>
<citations>
<citation type="bibtex">
@article{ardin_mutspec:_2016,
title = {{MutSpec}: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes},
volume = {17},
issn = {1471-2105},
doi = {10.1186/s12859-016-1011-z},
shorttitle = {{MutSpec}},
abstract = {{BACKGROUND}: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called {MutSpec}.
{RESULTS}: {MutSpec} includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the {COSMIC} database and other sources. {MutSpec} offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. {MutSpec} may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.
{CONCLUSIONS}: {MutSpec} offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. {MutSpec} can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.},
pages = {170},
number = {1},
journaltitle = {{BMC} Bioinformatics},
author = {Ardin, Maude and Cahais, Vincent and Castells, Xavier and Bouaoun, Liacine and Byrnes, Graham and Herceg, Zdenko and Zavadil, Jiri and Olivier, Magali},
date = {2016},
pmid = {27091472},
keywords = {Galaxy, Mutation signatures, Mutation spectra, Single base substitutions}
}
</citation>
</citations>
</tool>