Skip to content

Commit bf918db

Browse files
Merge pull request #131 from weecology/juniper_active
LDATS 0.2.0
2 parents 8fe7099 + 8d079b6 commit bf918db

99 files changed

Lines changed: 2483 additions & 1233 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@
66
^\.Rproj\.user$
77
^CONTRIBUTING\.md$
88
^CODE_OF_CONDUCT\.md$
9+
^_pkgdown\.yml$

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: LDATS
22
Title: Latent Dirichlet Allocation coupled with Time Series analyses
3-
Version: 0.1.0
3+
Version: 0.2.0
44
Authors@R: c(
55
person(c("Juniper", "L."), "Simonis",
66
email = "juniper.simonis@weecology.org", role = c("aut", "cre"),
@@ -30,6 +30,7 @@ Imports:
3030
coda,
3131
digest,
3232
dplyr,
33+
extraDistr,
3334
graphics,
3435
grDevices,
3536
here,

NAMESPACE

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Generated by roxygen2: do not edit by hand
22

3-
S3method(AIC,TS_fit)
43
S3method(logLik,LDA_VEM)
4+
S3method(logLik,TS_fit)
55
S3method(logLik,multinom_TS_fit)
66
S3method(plot,LDA_TS)
77
S3method(plot,LDA_VEM)
@@ -10,15 +10,16 @@ S3method(plot,TS_fit)
1010
S3method(print,LDA_TS)
1111
S3method(print,TS_fit)
1212
S3method(print,TS_on_LDA)
13+
export(AICc)
1314
export(LDA_TS)
14-
export(LDA_TS_controls_list)
15-
export(LDA_controls_list)
15+
export(LDA_TS_control)
1616
export(LDA_msg)
1717
export(LDA_plot_bottom_panel)
1818
export(LDA_plot_top_panel)
1919
export(LDA_set)
20+
export(LDA_set_control)
2021
export(TS)
21-
export(TS_controls_list)
22+
export(TS_control)
2223
export(TS_diagnostics_plot)
2324
export(TS_on_LDA)
2425
export(TS_summary_plot)
@@ -40,6 +41,7 @@ export(check_seeds)
4041
export(check_timename)
4142
export(check_topics)
4243
export(check_weights)
44+
export(conform_LDA_TS_data)
4345
export(count_trips)
4446
export(diagnose_ptMCMC)
4547
export(document_weights)
@@ -49,6 +51,8 @@ export(est_regressors)
4951
export(eta_diagnostics_plots)
5052
export(eval_step)
5153
export(expand_TS)
54+
export(iftrue)
55+
export(logsumexp)
5256
export(measure_eta_vcov)
5357
export(measure_rho_vcov)
5458
export(memoise_fun)
@@ -59,6 +63,7 @@ export(multinom_TS_chunk)
5963
export(normalize)
6064
export(package_LDA_TS)
6165
export(package_LDA_set)
66+
export(package_TS)
6267
export(package_TS_on_LDA)
6368
export(package_chunk_fits)
6469
export(posterior_plot)
@@ -88,8 +93,11 @@ export(set_LDA_plot_colors)
8893
export(set_TS_summary_plot_cols)
8994
export(set_gamma_colors)
9095
export(set_rho_hist_colors)
96+
export(sim_LDA_TS_data)
97+
export(sim_LDA_data)
98+
export(sim_TS_data)
99+
export(softmax)
91100
export(step_chains)
92-
export(summarize_TS)
93101
export(summarize_etas)
94102
export(summarize_rhos)
95103
export(swap_chains)
@@ -106,6 +114,8 @@ importFrom(coda,autocorr)
106114
importFrom(coda,autocorr.diag)
107115
importFrom(coda,effectiveSize)
108116
importFrom(digest,digest)
117+
importFrom(extraDistr,rcat)
118+
importFrom(extraDistr,rdirichlet)
109119
importFrom(grDevices,devAskNewPage)
110120
importFrom(grDevices,rgb)
111121
importFrom(graphics,abline)
@@ -132,6 +142,7 @@ importFrom(stats,ecdf)
132142
importFrom(stats,logLik)
133143
importFrom(stats,median)
134144
importFrom(stats,rgeom)
145+
importFrom(stats,rnorm)
135146
importFrom(stats,runif)
136147
importFrom(stats,sd)
137148
importFrom(stats,terms)

NEWS.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,45 @@
11
# LDATS (development version)
22

3+
Version numbers follow [Semantic Versioning](https://semver.org/).
4+
5+
# [LDATS 0.2.0](https://github.com/weecology/ldats/releases/tag/v0.2.0)
6+
*2019-07-09*
7+
8+
## API updates
9+
* At the `LDA_TS` function level, the separate inputs for data tables (`document_term_table` and `document_covariate_table`) have been merged into a single input `data`, which can be just the `document_term_table` or a list including the `document_term_table` and optionally also a `document_covariate_table`. If covariates aren't provided, the function now constructs a covariate table assuming equi-spaced observations. If using a list, the function assumes that one and only one element of the list will have a name containing the letters "term", and at most one element containing the letters "covariate" (regular expressions are used for matching). ([addresses issue 119](https://github.com/weecology/LDATS/issues/119))
10+
* `timename` has been moved from within the `TS_controls_list` to a main argument in all associated functions.
11+
* The control lists have been made easier to interact with. Primarily, the arguments that previously required `LDA_controls_list`, `TS_controls_list`, or `LDA_TS_controls_list` inputs now take general `list` inputs (so `LDA_TS` does not need to have a nested set of control functions). Each control list is passed through a function (`LDA_set_control`, `TS_control`, or `LDA_TS_control`) to set any non-input values to their defaults. This also allows the removal of those controls list class definitions. ([addresses issue 130](https://github.com/weecology/LDATS/issues/130))
12+
13+
## Fixed and updated example code to improve user experience
14+
* Reduced the complexity of the example in the README ([addresses issue 115](https://github.com/weecology/LDATS/issues/115))
15+
* Added `control` input in the `plot` call in the example in the README ([addresses issue 116](https://github.com/weecology/LDATS/issues/116))
16+
* Reduced the number of seeds in the rodent vignette example ([addresses issue 117](https://github.com/weecology/LDATS/issues/117))
17+
18+
## Updated calculation of the number of observations in LDA
19+
* The number of observations for a VEM-fit LDA is now calculated as the number of entries in the document-term matrix (following Hoffman et al. and Buntine, see `?logLik.LDA_VEM` for references.
20+
* Associated, we now include an AICc function that is general and works in this specific case as defined ([addresses issue 129](https://github.com/weecology/LDATS/issues/129))
21+
22+
## Fixed bug in plotting across multiple outputs
23+
* A few plotting functions use `devAskNewPage` to help flip through multiple outputs, but were only resetting it with `devAskNewPage(FALSE)` at the end of a clean execution. The code has been updated with `on.exit(devAskNewPage(FALSE))`, which accounts for failed executions. ([addresses issue 118](https://github.com/weecology/LDATS/issues/118))
24+
25+
## Renamed functions
26+
* `summarize_TS` has been renamed `package_TS` to align with the other `package_` functions that package model output.
27+
28+
## Simulate functions
29+
* Basic simulation functionality has been added for help with generating data sets to analyze. ([addresses issue 114](https://github.com/weecology/LDATS/issues/114))
30+
* `sim_LDA_data` simulates an LDA model's document-term-matrix
31+
* `sim_TS_data` simulates an TS model's document-topic distribution matrix
32+
* `sim_LDA_TS_data` simulates an LDA_TS model's document-term-matrix
33+
* `softmax` and `logsumexp` are added as utility functions
34+
35+
## Improved pkgdown site
36+
* Function organization ([addresses issue 122](https://github.com/weecology/LDATS/issues/122)) and navbar formatting.
37+
38+
## Editing of output from `TS`
39+
* Due to a misread of earlier code, the AIC value in the output from `TS` was named "deviance". The output has been updated to return the AIC.
40+
41+
## Replacement of `AIC` method with `logLik` method for `TS_fit`
42+
343
# [LDATS 0.1.0](https://github.com/weecology/LDATS/pull/105)
444
*2019-02-11*
545

R/LDA.R

Lines changed: 53 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,11 @@
2121
#' @param nseeds Number of seeds (replicate starts) to use for each
2222
#' value of \code{topics}. Must be conformable to \code{integer} value.
2323
#'
24-
#' @param control Class \code{LDA_controls} list of control parameters to be
25-
#' used in \code{LDA} (note that \code{seed} will be overwritten).
24+
#' @param control A \code{list} of parameters to control the running and
25+
#' selecting of LDA models. Values not input assume default values set
26+
#' by \code{\link{LDA_set_control}}. Values for running the LDAs replace
27+
#' defaults in (\code{LDAcontol}, see \code{\link[topicmodels]{LDA}} (but if
28+
#' \code{seed} is given, it will be overwritten; use \code{iseed} instead).
2629
#'
2730
#' @return List (class: \code{LDA_set}) of LDA models (class: \code{LDA_VEM}).
2831
#'
@@ -46,10 +49,12 @@
4649
#' @export
4750
#'
4851
LDA_set <- function(document_term_table, topics = 2, nseeds = 1,
49-
control = LDA_controls_list()){
52+
control = list()){
5053
check_LDA_set_inputs(document_term_table, topics, nseeds, control)
54+
control <- do.call("LDA_set_control", control)
5155
mod_topics <- rep(topics, each = length(seq(2, nseeds * 2, 2)))
52-
mod_seeds <- rep(seq(2, nseeds * 2, 2), length(topics))
56+
iseed <- control$iseed
57+
mod_seeds <- rep(seq(iseed, iseed + (nseeds - 1)* 2, 2), length(topics))
5358
nmods <- length(mod_topics)
5459
mods <- vector("list", length = nmods)
5560
for (i in 1:nmods){
@@ -63,9 +68,13 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1,
6368

6469
#' @title Calculate the log likelihood of a VEM LDA model fit
6570
#'
66-
#' @description Imported calculations from topicmodels package, as applied to
67-
#' Latent Dirichlet Allocation fit with Variational Expectation Maximization
68-
#' via \code{\link[topicmodels]{LDA}}.
71+
#' @description Imported but updated calculations from topicmodels package, as
72+
#' applied to Latent Dirichlet Allocation fit with Variational Expectation
73+
#' Maximization via \code{\link[topicmodels]{LDA}}.
74+
#'
75+
#' @details The number of degrees of freedom is 1 (for alpha) plus the number
76+
#' of entries in the document-topic matrix. The number of observations is
77+
#' the number of entries in the document-term matrix.
6978
#'
7079
#' @param object A \code{LDA_VEM}-class object.
7180
#'
@@ -75,17 +84,26 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1,
7584
#' (degrees of freedom) and \code{nobs} (number of observations) values.
7685
#'
7786
#' @references
87+
#' Buntine, W. 2002. Variational extentions to EM and multinomial PCA.
88+
#' \emph{European Conference on Machine Learning, Lecture Notes in Computer
89+
#' Science} \strong{2430}:23-34. \href{https://bit.ly/327sltH}{link}.
90+
#'
7891
#' Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic
7992
#' Models. \emph{Journal of Statistical Software} \strong{40}:13.
8093
#' \href{https://www.jstatsoft.org/article/view/v040i13}{link}.
8194
#'
95+
#' Hoffman, M. D., D. M. Blei, and F. Bach. 2010. Online learning for
96+
#' latent Dirichlet allocation. \emph{Advances in Neural Information
97+
#' Processing Systems} \strong{23}:856-864.
98+
#' \href{https://bit.ly/2LEr5sb}{link}.
99+
#'
82100
#' @export
83101
#'
84102
logLik.LDA_VEM <- function(object, ...){
85103
val <- sum(object@loglikelihood)
86104
df <- as.integer(object@control@estimate.alpha) + length(object@beta)
87105
attr(val, "df") <- df
88-
attr(val, "nobs") <- object@Dim[1]
106+
attr(val, "nobs") <- object@Dim[1] * object@Dim[2]
89107
class(val) <- "logLik"
90108
val
91109
}
@@ -104,16 +122,15 @@ check_LDA_set_inputs <- function(document_term_table, topics, nseeds,
104122
check_document_term_table(document_term_table)
105123
check_topics(topics)
106124
check_seeds(nseeds)
107-
check_control(control, "LDA_controls")
125+
check_control(control)
108126
}
109127

110128
#' @title Set the control inputs to include the seed
111129
#'
112130
#' @description Update the control list for the LDA model with the specific
113-
#' seed as indicated.
131+
#' seed as indicated. And remove controls not used within the LDA itself.
114132
#'
115-
#' @param seed \code{number} of seeds (replicate starts) to use for the
116-
#' specific model.
133+
#' @param seed \code{integer} used to set the seed of the specific model.
117134
#'
118135
#' @param control Named list of control parameters to be used in
119136
#' \code{\link[topicmodels]{LDA}} Note that is \code{control} has an
@@ -124,17 +141,12 @@ check_LDA_set_inputs <- function(document_term_table, topics, nseeds,
124141
#'
125142
#' @export
126143
#'
127-
prep_LDA_control <- function(seed, control = NULL){
128-
if("LDA_controls" %in% class(control)){
129-
class(control) <- "list"
130-
control$quiet <- NULL
131-
control$measurer <- NULL
132-
control$selector <- NULL
133-
control$seed <- seed
134-
}
135-
if(is.null(control)){
136-
control <- list(seed = seed)
137-
}
144+
prep_LDA_control <- function(seed, control = list()){
145+
control$quiet <- NULL
146+
control$measurer <- NULL
147+
control$selector <- NULL
148+
control$iseed <- NULL
149+
control$seed <- seed
138150
control
139151
}
140152

@@ -147,9 +159,11 @@ prep_LDA_control <- function(seed, control = NULL){
147159
#' @param LDA_models An object of class \code{LDA_set} produced by
148160
#' \code{\link{LDA_set}}.
149161
#'
150-
#' @param control Class \code{LDA_controls} list (generated by
151-
#' \code{\link{LDA_controls_list}}) including named elements
152-
#' corresponding to the \code{measurer} and \code{evaluator} functions.
162+
#' @param control A \code{list} of parameters to control the running and
163+
#' selecting of LDA models. Values not input assume default values set
164+
#' by \code{\link{LDA_set_control}}. Values for running the LDAs replace
165+
#' defaults in (\code{LDAcontol}, see \code{\link[topicmodels]{LDA}} (but if
166+
#' \code{seed} is given, it will be overwritten; use \code{iseed} instead).
153167
#'
154168
#' @return A reduced version of \code{LDA_models} that only includes the
155169
#' selected LDA model(s). The returned object is still an object of
@@ -165,14 +179,13 @@ prep_LDA_control <- function(seed, control = NULL){
165179
#'
166180
#' @export
167181
#'
168-
select_LDA <- function(LDA_models = NULL, control = LDA_controls_list()){
169-
170-
measurer <- control$measurer
171-
selector <- control$selector
182+
select_LDA <- function(LDA_models = NULL, control = list()){
172183
if("LDA_set" %in% attr(LDA_models, "class") == FALSE){
173184
stop("LDA_models must be of class LDA_set")
174185
}
175-
186+
control <- do.call("LDA_set_control", control)
187+
measurer <- control$measurer
188+
selector <- control$selector
176189
lda_measured <- vapply(LDA_models, measurer, 0) %>%
177190
matrix(ncol = 1)
178191
lda_selected <- apply(lda_measured, 2, selector)
@@ -227,15 +240,16 @@ package_LDA_set <- function(mods, mod_topics, mod_seeds){
227240
#'
228241
#' @export
229242
#'
230-
LDA_msg <- function(mod_topics, mod_seeds, control){
243+
LDA_msg <- function(mod_topics, mod_seeds, control = list()){
244+
control <- do.call("LDA_set_control", control)
231245
check_topics(mod_topics)
232246
check_seeds(mod_seeds)
233247
topic_msg <- paste0("Running LDA with ", mod_topics, " topics ")
234248
seed_msg <- paste0("(seed ", mod_seeds, ")")
235249
qprint(paste0(topic_msg, seed_msg), "", control$quiet)
236250
}
237251

238-
#' @title Create control list for LDA model
252+
#' @title Create control list for set of LDA models
239253
#'
240254
#' @description This function provides a simple creation and definition of
241255
#' the list used to control the set of LDA models. It is set up to be easy
@@ -250,16 +264,17 @@ LDA_msg <- function(mod_topics, mod_seeds, control){
250264
#' and \code{selector} operates on the values to choose the model(s) to
251265
#' pass on.
252266
#'
267+
#' @param iseed \code{integer} initial seed for the model set.
268+
#'
253269
#' @param ... Additional arguments to be passed to
254270
#' \code{\link[topicmodels]{LDA}} as a \code{control} input.
255271
#'
256272
#' @return Class \code{LDA_controls} list for controlling the LDA model fit.
257273
#'
258274
#' @export
259275
#'
260-
LDA_controls_list <- function(quiet = FALSE, measurer = AIC, selector = min,
261-
...){
262-
out <- list(quiet = quiet, measurer = measurer, selector = selector, ...)
263-
class(out) <- c("LDA_controls", "list")
264-
out
276+
LDA_set_control <- function(quiet = FALSE, measurer = AIC, selector = min,
277+
iseed = 2, ...){
278+
list(quiet = quiet, measurer = measurer, selector = selector,
279+
iseed = iseed, ...)
265280
}

R/LDATS.R

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#' @importFrom coda as.mcmc autocorr autocorr.diag effectiveSize HPDinterval
22
#' @importFrom digest digest
3+
#' @importFrom extraDistr rcat rdirichlet
34
#' @importFrom graphics abline axis hist mtext par plot points rect text
45
#' @importFrom grDevices devAskNewPage rgb
56
#' @importFrom lubridate is.Date
@@ -9,8 +10,8 @@
910
#' @importFrom mvtnorm rmvnorm
1011
#' @importFrom nnet multinom
1112
#' @importFrom progress progress_bar
12-
#' @importFrom stats acf AIC as.formula coef ecdf logLik median rgeom runif sd
13-
#' terms var vcov
13+
#' @importFrom stats acf AIC as.formula coef ecdf logLik median rgeom rnorm
14+
#' runif sd terms var vcov
1415
#' @importFrom topicmodels LDA
1516
#' @importFrom viridis viridis
1617
#'
@@ -23,12 +24,15 @@
2324
#' 2003) and Bayesian Time Series models (Western and Kleykamp 2004) that we
2425
#' extend for multinomial data using softmax regression (Venables and Ripley
2526
#' 2002) following Christensen \emph{et al.} (2018).
26-
#' \cr \cr
27-
#' \href{https://github.com/weecology/LDATS/blob/master/manuscript/simonis_et_al.pdf}{Technical mathematical manuscript}
27+
#'
28+
#' @section Documentation:
29+
#' \href{https://bit.ly/2Jq73A5}{Technical mathematical manuscript}
2830
#' \cr \cr
29-
#' \href{https://weecology.github.io/LDATS/articles/rodents-example.html}{End-user-focused vignette worked example}
31+
#' \href{https://bit.ly/2Jvj9GS}{End-user-focused vignette worked example}
3032
#' \cr \cr
31-
#' \href{https://weecology.github.io/LDATS/articles/LDATS_codebase.html}{Computational pipeline vignette}
33+
#' \href{https://bit.ly/2xFzJOW}{Computational pipeline vignette}
34+
#' \cr \cr
35+
#' \href{https://bit.ly/2NFTVLh}{Comparison to Christensen \emph{et al.}}
3236
#'
3337
#' @references
3438
#'

0 commit comments

Comments
 (0)