forked from wangk4/chipenrich
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathNEWS
427 lines (267 loc) · 14.7 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
CHANGES IN VERSION 2.16.0
-------------------------
o Transition to Kai Wang as maintainer.
CHANGES IN VERSION 2.10.0
-------------------------
NEW FEATURES
o A new test, proxReg(), tests for genomic region binding proximity to either
gene transcription start sites or enhancer regions within gene sets. Used as an
addendum to any gene set enrichment test, not exclusive to those in this package.
IMPROVEMENTS
o Poly-Enrich now uses the likelihood ratio test instead of the Wald test, as
LRT is more robust when using a negative binomial GLM.
BUG FIXES
o Poly-Enrich Approximate method that uses the score test now uses the correct
formula.
CHANGES IN VERSION 2.4.0
-------------------------
NEW FEATURES
o A new function, peaks2genes(), to run the analysis up to, but not including,
the enrichment testing. Useful for checking QC plots, check qualities of
peak-to-gene assignments, and easier custom tests.
SIGNIFICANT USER-LEVEL CHANGES
o The hybridenrich() method now returns the same format as chipenrich() and
polyenrich()
IMPROVEMENTS
o Vignette now describes all available gene sets
BUG FIXES
o Fixed multiAssign weighting method to use the correct weights.
CHANGES IN VERSION 2.2.0
-------------------------
NEW FEATURES
o polyenrich now supports weighting peaks by signal value
o A hybrid test, hybridenrich() is available for those unsure of which test,
between chipenrich() and polyenrich() to use.
o A function to join two different results files, hybrid.join(), and it will
give an adjusted set of p-values and FDR-adjusted p-values using the two.
o A new approximation method using the Score test is available for
quick results for chipenrich and polyenrich. Only recommended for
significantly enriched results, and not depleted results. ~30x faster.
IMPROVEMENTS
o Several updates to the vignette
CHANGES IN VERSION 2.0.0
-------------------------
NEW FEATURES
o A new method for enrichment, polyenrich() is designed for gene set enrichment
of experiments where the presence of multiple peaks in a gene is accounted
for in the model. Use the polyenrich() function for this method.
o New features resulting from chipenrich.data 2.0.0:
o New genomes in chipenrich.data: danRer10, dm6, hg38, rn5, and rn6.
o Reactome for fly in chipenrich.data.
o Added locus definitions, GO gene sets, and Reactome gene sets for zebrafish.
o All genomes have the following locus definitions: nearest_tss, nearest_gene,
exon, intron, 1kb, 5kb, 10kb, 1kb_outside_upstream, 5kb_outside_upstream,
10kb_outside_upstream, 1kb_outside, 5kb_outside, and 10kb_outside.
IMPROVEMENTS
o The chipenrich method is now significantly faster. Chris Lee figured out
that spline calculations in chipenrich are not required for each gene set.
Now a spline is calculated as peak ~ s(log10_length) and used for all gene
sets. The correlation between the resulting p-values is nearly always 1.
Unfortunately, this approach cannot be used for broadenrich().
o The chipenrich(..., method='chipenrich', ...) function automatically
uses this faster method.
o Clarified documentation for the supported_locusdefs() to give explanations
for what each locus definition is.
o Use sys.call() to report options used in chipenrich() in opts.tab output. We
previously used as.list(environment()) which would also output entire
data.frames if peaks were loaded in as a data.frame.
o Various updates to the vignette to reflect new features.
SIGNIFICANT USER-LEVEL CHANGES
o As a result of updates to chipenrich.data, ENRICHMENT RESULTS MAY DIFFER
between chipenrich 1.Y.Z and chipenrich 2.Y.Z. This is because revised
versions of all genomes have been used to update LocusDefinitions, and
GO and Reactome gene sets have been updated to more recent versions.
o The broadenrich method is now its own function, broadenrich(), instead
of chipenrich(..., method = 'broadenrich', ...).
o User interface for mappability has been streamlined. 'mappability' parameter
in broadenrich(), chipenrich(), and polyenrich() functions replaces the
three parameters previously used: 'use_mappability', 'mappa_file', and
'read_length'. The unified 'mappability' parameter can be 'NULL', a file path,
or a string indicating the read length for mappability, e.g. '24'.
o A formerly hidden API for randomizations to assess Type I Error rates for
data sets is now exposed to the user. Each of the enrich functions has a
'randomization' parameter. See documentation and vignette for details.
o Many functions with the 'genome' parameter had a default of 'hg19', which
was not ideal. Now users must specify a genome and it is checked against
supported_genomes().
o Input files are read according to their file extension. Supported extensions
are bed, gff3, wig, bedGraph, narrowPeak, and broadPeak. Arbitrary extensions
are also supported, but there can be no header, and the first three columns
must be chr, start, and end.
SIGNIFICANT BACKEND CHANGES
o Harmonize all code touching LocusDefinition and tss objects to reflect
changes in chipenrich.data 2.0.0.
o Alter setup_ldef() function to add symbol column. If a valid genome is
used use orgDb to get eg2symbol mappings and fill in for the user. Users
can give their own symbol column which will override using orgDb. Finally,
if neither symbol column or valid genome is used, symbols are set to NA.
o Any instance of 'geneid' or 'names' to refer to Entrez Gene IDs are now
'gene_id' for consistency.
o Refactor read_bed() function as a wrapper for rtracklayer::import().
o Automatic extension handling of BED3-6, gff3, wig, or bedGraph.
o With some additional code, automatic extension handling of narrowPeak
and broadPeak.
o Backwards compatible with arbitrary extensions: this still assumes that
the first three columns are chr, start, end.
o The purpose of this refactor is to enable additional covariates for the
peaks for possible use in future methods.
o Refactor load_peaks() to use GenomicRanges::makeGRangesFromDataFrame().
o Filtering gene sets is now based on the locus definition, and can be done
from below (min) or above (max). Defaults are 15 and 2000, respectively.
o Randomizations are all done on the LocusDefinition object.
o Added lots of unit tests to increase test coverage.
o Make Travis builds use sartorlab/chipenrich.data version of data package
for faster testing.
DEPRECATED AND DEFUNCT
o Calling the broadenrich method with chipenrich(..., method = 'broadenrich', ...)
is no longer valid. Instead, use broadenrich().
o Various utility functions that were used in the original development have
been removed. Users never saw or used them.
BUG FIXES
o Fixed bug in randomization with length bins where artifactually, randomizations
would sort genes on Entrez ID introducing problems in Type I error rate.
o Fixed a bug where the dependent variable used in the enrichment model
was used to name the rows of the enrichment results. This could be confusing
for users. Now, rownames are simply integers.
o Fixed a bug that expected the result of read_bed() to be a list of IRanges
from initial development. Big speed bump.
CHANGES IN VERSION 1.12.1
-------------------------
BUG FIXES
o Fixed a bug in the check for proper organism + geneset combinations. Prevented
combinations that are actually valid from running.
CHANGES IN VERSION 1.12.0
-------------------------
IMPROVEMENTS
o Improve supported_*() functions to report and check combinations of genome,
organism, genesets, locusdef, and mappability read length.
o Cleanup DESCRIPTION and NAMESPACE to avoid loading entire packages.
o Assigning peaks using GenomicRanges object rather than than list of IRanges.
o Follow data() best practices.
USER-INVISIBLE CHANGES
o Transition documentation to roxygen2 blocks.
o Improve commenting in chipenrich() function.
o Rewrite package vignette in Rmarkdown and render with knitr.
CHANGES IN VERSION 1.4.0
------------------------
NEW FEATURES
o A new method, broadenrich, is available in the chipenrich function which is
designed for gene set enrichment on broad genomic regions, such as peaks resulting
from histone modificaiton based ChIP-seq experiments.
o Methods chipenrich and broadenrich are available in multicore versions (on every
platform except Windows). The user selects the number of cores when calling
the chipenrich function.
o Peaks downloaded from the ENCODE Consortium as .broadPeak or .narrowPeak files
are supported directly.
o Peaks downloaded from the modENCODE Consortium as .bed.gff or .bed.gff3 files are
also supported directly.
o Support for D. melanogaster (dm3) genome and enrichment testing for GO terms
from all three branches (GOBP, GOCC, and GOMF).
o New gene sets from Reactome (http://www.reactome.org) for human, mouse, and rat.
o New example histone data set, peaks_H3K4me3_GM12878, based on hg19.
o New locus definitions including: introns, 10kb within TSS, and 10kb upstream of TSS.
CHANGES IN VERSION 1.0
----------------------
PKG FEATURES
o chipenrich performs gene set enrichment tests on peaks called from
a ChIP-seq experiment
o chipenrich empirically corrects for confounding factors such as
the length of genes and mappability of sequence surrounding genes
o Use multiple definitions of a gene "locus" when testing for enrichment,
or provide your own definition
o Test for enrichment using chipenrich or Fisher's exact test (should only
be used for datasets where peaks are close to TSSs, see docs)
o Test multiple sets of genesets (Gene Ontology, KEGG, Biocarta, OMIM, etc.)
o Multiple plots to describe binding distance and likelihood of a peak
as a function of gene length
o Support for human (hg19), mouse (mm9), and rat (rn4) genomes
o Many conveniences such as seeing which peaks were assigned to genes,
their position relative to those genes and their TSS, etc.
o See how many peaks were assigned to each gene along with the length and
mappability of the gene
CHANGES IN VERSION 0.99.2
-------------------------
USER-VISIBLE CHANGES
o Updated examples for various functions to be runnable (removed donttest)
o Updated DESCRIPTION to use Imports: rather than Depends:
o Updated license to GPL-3
o Updated NEWS file for bioconductor guidelines
BUG FIXES
o Added a correction for the case where a small gene set has a peak in
every gene. This has the result of making a very few number of tests
slightly conservative, at the benefit of actually being able to return
a p-value for them.
CHANGES IN VERSION 0.99.1
-------------------------
USER-VISIBLE CHANGES
o Minor updates to documentation for Bioconductor
CHANGES IN VERSION 0.99.0
-------------------------
NEW FEATURES
o Initial submission to Bioconductor
CHANGES IN VERSION 0.9.6
------------------------
NEW FEATURES
o Added peaks per gene as a returned object / output file
CHANGES IN VERSION 0.9.5
------------------------
BUG FIXES
o Update to handle bioconductor/IRange's new "functionality" for distanceToNearest and distance
USER-VISIBLE CHANGES
o Changed sorting of results to put enriched terms first (sorted by p-value), then depleted (also sorted by p-value)
CHANGES IN VERSION 0.9.4
------------------------
USER-VISIBLE CHANGES
o Minor changes to vignette and documentation
CHANGES IN VERSION 0.9.3
------------------------
NEW FEATURES
o Addition of rat genome
BUG FIXES
o chipenrich() will correctly open both .bed and .bed.gz files now
CHANGES IN VERSION 0.9.2
------------------------
NEW FEATURES
o Added ability for user to input their own locus definition file (pass the full path to a file as the locusdef argument)
o Added a data frame to the results object that gives the arguments/values passed to chipenrich, also written to file *_opts.tab
o For FET and chipenrich methods, the outcome variable can be recoded to be >= 1 peak, 2 peaks, 3 peaks, etc. using the num_peak_threshold parameter
o Added a parameter to set the maximum size of gene set that should be tested (defaults to 2000)
USER-VISIBLE CHANGES
o Previously only peak midpoints were given in the peak --> gene assignments file, now the original peak start/ends are also given
o Updated help/man with new parameters and more information about the results
BUG FIXES
o Fixed an issue where status in results was not enriched if the odds ratio was infinite, and depleted if the odds ratio was exactly zero
CHANGES IN VERSION 0.9.1
------------------------
NEW FEATURES
o Added a QC plot for expected # of peaks and actual # of peaks vs. gene locus length. This will be automatically created if qc_plots is TRUE, or the plots can be created using the plot_expected_peaks function.
o Distance to TSS is now signed for upstream (-) and downstream (+) of TSS
o Column added to indicate whether the geneset is enriched or depleted
CHANGES IN VERSION 0.9
----------------------
NEW FEATURES
o Added support for reading BED files natively
BUG FIXES
o Fixed bug where invalid geneset in chipenrich() wasn't detected properly
CHANGES IN VERSION 0.8
----------------------
BUG FIXES
o Fixed crash when mappability contained an NA (will be removed from DB in future version)
CHANGES IN VERSION 0.7
----------------------
USER-VISIBLE CHANGES
o Updated binomial test to sum gene locus lengths to get genome length and remove genes that are not present in the set of genes being tested
o Updated spline fit plot to take into account mappability if requested (log mappable locus length plotted instead of simply log locus length)
o Removed SAMPLEABLE_GENOME* constants since they are no longer needed
o Updated help files to reflect changes to plot_spline_length and chipenrich functions
BUG FIXES
o Fixed bug where results for multiple gene set types (e.g. doing BioCarta and KEGG together) were not sorted by p-value
CHANGES IN VERSION 0.6
----------------------
BUG FIXES
o Fixed bug where 1kb/5kb locusdefs could fail if not all peaks were assigned to a gene
CHANGES IN VERSION 0.5
----------------------
USER-VISIBLE CHANGES
o Updated help to explain new mappability model
o Changed how mappability is handled - now multiplies gene locus length by mappability, rather than adjusting as a spline term