-
Notifications
You must be signed in to change notification settings - Fork 7
mcmcse: updates, cleanup, and efficiency
The mcmcse package was built to estimate Monte Carlo standard errors for Markov chain Monte Carlo. It has since expanded to multivariate output analysis methods and the reliable calculation of effective sample size. However, the package is structured to only take a single Markov chain as input. Reliable estimation of standard errors from multiple chains can be done via replicated variance calculations. Implementing these replicated variance calculations requires significant updates to the package, and the addition of a few functions specifically for multiple chains.
Most of the heavy coding is written in C++ using Rcpp. A CRAN hosted version of the package is here and a GitHub development version of the package is here.
There are a few other packages in R that do univariate effective sample size calculations (for multiple chains), the most popular of which is coda. However, coda does not use consistent estimators of the variance, and the variance estimates are known to be liberal. In addition, there is no other package that we know that does multivariate effective sample size calculations.
Over the three months, I would expect the student to complete the following tasks:
- Change the current implementation of the function
multiESS
to allow the input of list of Markov chains, and estimate the effective sample size from replicated variance methods, including batch means and spectral variance methods. - Implement all additional computationally heavy coding with Rcpp and perform heavy benchmarking to find under which situations does the computation become too burdensome.
- Test all functions for numerical instabilities.
- The current version of the package requires thorough user testing and code testing. This will require the addition of
testthat
. - The student will be required to improve documentation on
multiESS
and bring uniformity in all documentations.
The package mcmcse has been dowloaded over 30,000 times and has 71 citations on Google Scholar. Already the package has been found to be useful by the generic scientific community, and any and all improvements in the package will continue to benefit this larger community.
- EVALUATING MENTOR: Dootika Vats [email protected] is the author and maintainer of R package mcmcse and a contributor on R package stableGR. She was a GSoC student participant in 2015 for this same package and an expert in MCMC output analysis.
- James Flegal [email protected] is the founding author of the package and an expert in MCMC output analysis
Students, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.
- Easy: (1) Download the mcmcse package from CRAN and use the function
ess
on a vectorfoo
of length 1e4 randomly drawn from a standard normal distribution. (2) Make a random matrix of size 10 x 10 and produce only the eigenvalues of the matrix. - Medium: Write a function that runs a Gaussian AR(1) model and use
mcmcse
to estimate the effective sample size. - Hard: Implement the replicated batch means estimator from this paper.
Students, please post a link to your test results here.
- EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS. Test Results
- Kushagra Gupta, GitHub profile, Test results
- Sonali Sharma, GitHub profile, Test results
- Prateek Varshney, Github Profile, Test results