-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex-clip.html
273 lines (182 loc) · 21.1 KB
/
index-clip.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
<!DOCTYPE html>
<html lang="en">
<head>
<title>Galaxy Europe</title>
<meta property="og:title" content="" />
<meta property="og:description" content="" />
<meta property="og:image" content="/assets/media/galaxy-eu-logo.512.png" />
<meta name="description" content="The European Galaxy Instance">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
<link rel="stylesheet" href="/assets/css/main.css">
<link rel="canonical" href="https://galaxyproject.eu/index-clip.html">
<link rel="shortcut icon" href="/assets/media/galaxy-eu-logo.64.png" type="image/x-icon" />
<link rel="alternate" type="application/rss+xml" title="Galaxy Europe" href="/feed.xml">
<link href="/assets/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous">
<script src="/assets/js/jquery-3.2.1.slim.min.js" integrity="sha256-k2WSCIexGzOj3Euiig+TlR8gA0EmPjuc79OEeY5L45g=" crossorigin="anonymous"></script>
<script src="/assets/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script>
</head>
<body>
<div id="wrap">
<div id="main">
<div class="container" id="maincontainer">
<div class="home">
<h1 id="galaxy-clip-explorer">Galaxy CLIP-Explorer</h1>
<p>Welcome to the Galaxy CLIP-Explorer – a webserver to process, analyse and visualise CLIP-Seq data.</p>
<p><img src="/assets/media/cover_design_clipseq.png" alt="" /></p>
<h2 id="1-getting-started-with-galaxy-clip-explorer">1. Getting Started with Galaxy CLIP-Explorer</h2>
<p><strong>Are you new to Galaxy?</strong> Are your returning after a long time, and looking for help to get started? Then take <a target="_parent" href="https://hicexplorer.usegalaxy.eu/tours/core.galaxy_ui">a guided tour</a> through the user interface of Galaxy.</p>
<p>You have CLIP-Seq data, but you need some <strong>guidance for the CLIP-Seq data anlysis</strong>? Take a look at the CLIP-Seq data analysis tutorial on the <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">Galaxy Training Network</a> where you can analyse CLIP-Seq data of RBFOX2 from human liver cancer cells (Hep G2). The tutorial will help you to understand the analysis steps and the most important parameters and tools that are used in CLIP-Explorer.</p>
<p>The underlying workflow of the tutorial can be found <a target="_parent" href="https://github.com/galaxyproject/training-material/tree/master/topics/transcriptomics/tutorials/clipseq/workflows/">here</a>.</p>
<p>We recommend to follow the tutorial on <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html">FastQC</a> for quality checks and the tutorial for <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/introduction/tutorials/igv-introduction/tutorial.html">IGV</a> for data inspection.</p>
<p>The Galaxy Training Network tutorial uses eCLIP data from human liver cancer cells (Hep G2) and is hosted on zenodo: <a target="_parent" href="https://zenodo.org/record/1327423"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.1327423.svg" alt="DOI" /></a></p>
<p>Galaxy CLIP-Explorer can process large CLIP-Seq data of eCLIP and iCLIP. We processed eCLIP data with around 20 million reads from <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a>. CLIP-Explorer can handle multiplexed and de-multiplexed eCLIP and iCLIP data in FASTQ and FASTA format.</p>
<h2 id="2-galaxy-clip-explorer--many-possibilities">2. Galaxy CLIP-Explorer – Many Possibilities</h2>
<p><img src="/assets/media/content_design_clipseq.png" alt="" />
<b>(A)</b> Galaxy CLIP-Explorer workflows and tools; <b>(B)</b> Output of <code class="highlighter-rouge">multiBamSummary</code> and <code class="highlighter-rouge">plotCorrelation</code> comparing two biological replicates of a CLIP-Seq experiment and one control sample. <b>(C)</b> Output of <code class="highlighter-rouge">plotFingerprint</code> that shows the read coverage for the CLIP-Seq and control samples. <b>(D)</b> Output of <code class="highlighter-rouge">CollectInsertSizeMetrics</code> estimating the insert size for the read libraries. <b>(E)</b> Output of <code class="highlighter-rouge">FastQC</code> showing the duplication levels of the read libraries. <b>(F)</b> Sequence motifs of <code class="highlighter-rouge">MEME-Chip</code> (DREME and MEME) from binding sequence motifs that were predicted from potential binding regions (peaks) obtained by a peak caller like <code class="highlighter-rouge">PEAKachu</code>, <code class="highlighter-rouge">Piranha</code> or <code class="highlighter-rouge">PureCLIP</code>. <b>(G-I)</b> Example output of <code class="highlighter-rouge">RCAS</code> (RNA Centric Annotation System); <b>(G)</b> showing the binding coverage for the transcript and the 5’ and 3’ UTR, <b>(H)</b> depicting the binding coverage around the exon-intron boundaries, <b>(I)</b> and a generated target distribution plot which states what kind of RNAs the protein of interest prevalently binds to.</p>
<h2 id="3-workflows">3. Workflows</h2>
<p>Use the following workflows for an automatized data analysis for iCLIP and eCLIP data. The data needs to be in <strong>FASTA or FASTQ format</strong> and can be either <strong>multiplexed or de-multiplexed</strong>. All workflows, except the robust peak analysis, require the <strong>data as a list of dataset pairs</strong>. A tutorial to create a list of dataset pairs can be found in the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a> or <a href="https://galaxyproject.github.io/training-material/topics/galaxy-data-manipulation/tutorials/collections/tutorial.html">here</a>. Please have in mind that all workflows need additional input files from the user.</p>
<h3 id="31-quick-example-run">3.1 Quick Example Run</h3>
<p>If you want to make a quick run with example data, then download this example eCLIP data of RBFOX2 <a target="_parent" href="https://zenodo.org/record/1327423"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.1327423.svg" alt="DOI" /></a> and run the <a target="_parent" href="https://github.com/galaxyproject/training-material/tree/master/topics/transcriptomics/tutorials/clipseq/workflows/">workflow</a> of the CLIP-Seq training material mentioned on the <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">Galaxy Training Network</a>. Or, use the <a href="https://clipseq.usegalaxy.eu/u/heylf/w/1clipseq-explorerdemultiplexedpeakachuecliphg19n5-1">workflow for the eCLIP data of Nostrand et al. (2016)</a>. Keep in mind, you have to provide the input data as a <strong>list of dataset pairs</strong>. A tutorial to create a list of dataset pairs can be found in the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a> or <a href="https://galaxyproject.github.io/training-material/topics/galaxy-data-manipulation/tutorials/collections/tutorial.html">here</a>.</p>
<h3 id="32-from-scratch-to-de-multiplexed-fastq-files">3.2 From scratch to de-multiplexed FASTQ files</h3>
<p>If your data is not de-multiplexed yet, then use the workflows of this section. The user has to provide the in-line <strong>barcodes</strong> in a tab-delimited tabular format, for example:</p>
<ul>
<li>rep1 TTAG</li>
<li>rep2 TGGC</li>
<li>rep3 TTAA</li>
</ul>
<p>The raw data needs to be in FASTA or FASTQ format as a list of dataset pairs.</p>
<ul>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1demultiplexeclip">Workflow to de-multiplex eCLIP read library</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2demultiplexiclip">Workflow to de-multiplex iCLIP read library</a></li>
</ul>
<h3 id="33-from-scratch-with-de-multiplexed-fastq-files">3.3 From scratch with de-multiplexed FASTQ files</h3>
<p>You can choose between three different types of peak calling for the data analysis of eCLIP and iCLIP data. The data specification of each of the peak calling algorithms is listed below:</p>
<p><strong>Table 1</strong>: Data specification of the different peak calling algorithms.</p>
<table class="table table-striped">
<thead>
<tr>
<th>Tool</th>
<th style="text-align: center">Biological Replicates (Yes/No)</th>
<th style="text-align: center">Control Data (Yes/No)</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/tbischler/PEAKachu">PEAKachu</a></td>
<td style="text-align: center">Yes</td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1186/s13059-017-1364-2">PureCLIP</a></td>
<td style="text-align: center">No</td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1093/bioinformatics/bts569">Piranha</a></td>
<td style="text-align: center">No</td>
<td style="text-align: center">No</td>
</tr>
</tbody>
</table>
<h4 id="note-if-you-have-used-the-de-mutliplexing-workflows">Note if you have used the de-mutliplexing workflows:</h4>
<p>If you used the preceding workflows for de-multiplexing, then remove the steps of <code class="highlighter-rouge">Cutadapt</code> and <code class="highlighter-rouge">UMI-tools extract</code> from the following workflows to analyse your data. Simply, import the workflow into you account, remove the tools and connect the lose end directly to the alignment step.</p>
<h4 id="note-if-you-use-eclip-data-of-nostrand-et-al-2016">Note if you use eCLIP data of Nostrand et al. (2016):</h4>
<p>The workflow for the eCLIP data of <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a> was used to analyse the data of RBFOX2. Beware when using other data of the study of <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a>, because the size of the unique molecular identifier (UMI) can be different. The workflow is set to a UMI of five nucleotides. You can change this by importing the workflow into your account and amend the parameter <code class="highlighter-rouge">Cut bases from reads before adapter trimming</code> of the second <code class="highlighter-rouge">Cutadapt</code> step for the CLIP and control data.</p>
<h4 id="eclip">eCLIP</h4>
<ul>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1clipseq-explorerdemultiplexedpeakachuecliphg19n5-1">Workflow for the eCLIP data of Nostrand et al. (2016)</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2clipseq-explorerdemultipeakachuecliphg19">Peak calling with PEAKachu</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/3clipseq-explorerdemultipureclipecliphg19">Peak calling with PureCLIP</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/4clipseq-explorerdemultipiranhaecliphg19">Peak calling with Piranha</a></li>
</ul>
<h4 id="iclip">iCLIP</h4>
<ul>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1clipseq-explorerdemultipeakachuicliphg19">Peak calling with PEAKachu</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2clipseq-explorerdemultipureclipicliphg19">Peak calling with PureCLIP</a></li>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/3clipseq-explorerdemultipiranhaicliphg19">Peak calling with Piranha</a></li>
</ul>
<h3 id="34-further-optional-peak-analysis">3.4 Further optional peak analysis</h3>
<p>The following workflow can be used if you have picked a peak calling algorithm that do not support biological replicated data. The workflow finds and analysis robust binding regions shared between different peak files.</p>
<ul>
<li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/robustpeakanalysis">Robust peak analysis</a></li>
</ul>
<h2 id="4-remarks">4. Remarks</h2>
<p>Please follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a> for a deeper understand of the tools of CLIP-Explorer.</p>
<h3 id="41-changing-workflows">4.1 Changing Workflows</h3>
<p>You can change the workflows at anytime and without any problems. Simply import the workflow into your account and change the necessary tools or tool parameters.</p>
<h3 id="42-adapter-sequences">4.2 Adapter sequences</h3>
<p>The workflows uses <code class="highlighter-rouge">Cutadapt</code> to remove standard eCLIP and iCLIP adapter sequences. You need to change <code class="highlighter-rouge">Cutadapt</code> parameters if your read library covers other adapter sequences. Cutadapt cannot detect automatically standard Illumina or other standard adapters. You have to provide the sequence.</p>
<h3 id="43-umi-and-in-line-barcodes">4.3 UMI and in-line barcodes</h3>
<p>The workflows uses <code class="highlighter-rouge">Cutadapt</code> to trim of the length of the UMI (+ barcode) from one site of the read pair. This depends on the iCLIP, eCLIP and your own protocol. Please check or change the parameter in <code class="highlighter-rouge">Cutadapt</code> based on your UMI and in-line barcode. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>.</p>
<p>CLIP-explorer uses <code class="highlighter-rouge">UMI-tools extract</code> to find the UMIs inside your reads. Change the pattern of <code class="highlighter-rouge">UMI-tools extract</code> based on your read library preparation.</p>
<h3 id="44-read-alignment">4.4 Read alignment</h3>
<p>We use <code class="highlighter-rouge">STAR</code> to do the read alignment. <code class="highlighter-rouge">STAR</code> combines genome and transcriptome data. CLIP-Explorer focusses only on uniquely mapped read. Furthermore, <code class="highlighter-rouge">STAR</code> is executed with soft-clipping turned off. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>.</p>
<p>You can replace <code class="highlighter-rouge">STAR</code> with any other read mapper by importing the corresponding workflow into your account. <strong>Check the mapping quality</strong>: Look at the multiqc report in order to assess the mapping quality.</p>
<p><code class="highlighter-rouge">STAR</code> has many parameters. It is recommended to leave them in default. However, it can happen that <code class="highlighter-rouge">STAR</code> denotes a lot of read as unmapped, because they are too short. You might then want to adjust (lower) the two parameters <strong>Minimum alignment score, normalized to read length</strong> (–outFilterScoreMinOverLread), and <strong>Minimum number of matched bases, normalized to read length</strong> (–outFilterMatchNminOverLread).</p>
<h3 id="45-peak-calling-with-peakachu-pureclip-and-piranha">4.5 Peak calling with PEAKachu, PureCLIP, and Piranha</h3>
<h4 id="peakachu">PEAKachu</h4>
<p>You need to specific the insert size of your paired-end reads for <code class="highlighter-rouge">PEAKachu</code>. For that reason, check the output image of <code class="highlighter-rouge">CollectInsertSizeMetric</code> to get an estimate for that parameter.</p>
<p>The three parameters <strong>Mad Multiplier</strong> (default 2.0), <strong>Fold Change Threshold</strong> (default 2.0), and <strong>Adjusted p-value Threshold</strong> (default 0.05) are the primary filters to select significant peaks. Keep them in default. Then adjust them based on your question.</p>
<h4 id="pureclip">PureCLIP</h4>
<p>PureCLIP works best with only one site of the paired end reads, where the cross linking event occurs. Thus, CLIP-Explorer filters out the other mate before the peak calling. Remove the <code class="highlighter-rouge">Bam filter</code> tool to disable this behavior or change <code class="highlighter-rouge">Bam filter</code> to pick the correct site.</p>
<p>Important parameters for PureCLIP are the <strong>Bandwidth for kernel density estimation used to access enrichment</strong> (-bw) and the <strong>Bandwidth for kernel density estimation used to estimate n for binomial distributions</strong> (-bwn). Choose these two parameters wisely. They control the fitting of the model. Decreasing these two parameters result in overfitting.</p>
<p>If PureCLIP does not finish because of a memory error, or if PureCLIP takes too long, then try to apply the model just for a few chromosomes of the reference. Take a look at <strong>Genomic chromosomes to learn HMM parameters</strong> (-iv).</p>
<h4 id="piranha">Piranha</h4>
<p>Piranha works best with a zero truncated negative binomial (default), or with a negative binomial for CLIP-Seq data. The selected distribution plays an important part. You can change it under <strong>Select distribution type</strong> (-d).</p>
<p>Further important parameters are <strong>Indicates that input is raw reads and should be binned into bins of this size</strong> (-b) which controls for the fitting of the data. Decreasing this parameter results in overfitting. A good baseline of this parameter is a value around 50. The parameter <strong>Merge significant bins within certain distance?</strong> (-u) also controls for overfitting. Set it to <strong>No</strong> for more information. Set it to <strong>Yes</strong> and give it a value bigger than 0 to merge peaks together that are very close together. Set also the <strong>Significance threshold for sites</strong> to 0.05 (-p).</p>
<h3 id="46-extension-of-the-binding-regions">4.6 Extension of the binding regions</h3>
<p>CLIP-Explorer uses <code class="highlighter-rouge">SlopBED</code> to extend the peaks a few basepairs to the left and right in order to correct for an underestimation of the binding regions of the peak calling algorithms. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>. Remove the tool or change the parameter of <code class="highlighter-rouge">SlopBED</code> to change this behavior.</p>
<h2>Our Data Policy</h2>
<h3>Registered Users</h3>
User data on UseGalaxy.eu (i.e. datasets, histories) will be available as long
as they are not deleted by the user. Once marked as deleted the datasets will
be permanently removed within 14 days. If the user "purges" the dataset in the
Galaxy, it will be removed immediately, permanently.
An extended quota can be <a href="https://docs.google.com/forms/d/e/1FAIpQLSf9w2MOS6KOlu9XdhRSDqWnCDkzoVBqHJ3zH_My4p8D8ZgkIQ/viewform" target="_blank">requested</a>
for a limited time period in special cases.
<h3>Unregistered Users</h3>
Processed data will only be accessible during one browser session, using a
cookie to identify your data. This cookie is not used for any other purposes
(e.g. tracking or analytics.)
If UseGalaxy.eu service is not accessed for 90 days, those datasets will be
permanently deleted.
<h3>GDPR Compliance</h3>
The Galaxy service complies with the EU General Data Protection Regulation
(GDPR). You can read more about this on our
<a href="https://usegalaxy.eu/terms/">Terms and Conditions</a>.
<iframe style="border: 0px" width="100%" height="150px" src="https://stats.galaxyproject.eu/dashboard-solo/db/jobs-dashboard?panelId=1&orgId=1" ></iframe>
<div class="row">
<section class="section-content">
<div class="col-md-12">
</div>
</section>
</div>
</div>
</div>
</div>
</div>
<footer class="navbar-default">
<div class="container">
<div class="row">
<div class="col-lg-12" style="text-align:center">
<p>UseGalaxy.eu is maintained largely by the <a href="/freiburg/">Freiburg Galaxy Team</a> but also collectively by groups and individuals from across Europe. All of the members sites in this repository contribute to the European Galaxy Project.
All content on this site is available under <a href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0-1.0</a>, unless otherwise specified.</p>
</div>
</div>
<div class="row">
<div class="col-lg-12" style="text-align:center">
<ul class="contact-info">
<li><i class="fa fa-envelope"></i><a href="mailto:[email protected]">[email protected]</a></li>
<li><i class="fa fa-github"></i><a href="https://github.com/usegalaxy-eu">usegalaxy-eu</a></li>
<li><i class="fa fa-twitter"></i><a href="https://twitter.com/galaxyproject">galaxyproject</a></li>
<li><i class="fa fa-rss"></i>Subscribe <a href="/feed.xml">via RSS (UseGalaxy.eu Feed)</a></li>
</ul>
</div>
</div>
</div>
</footer>
</body>
</html>