-
Notifications
You must be signed in to change notification settings - Fork 0
SimTools: Output Analysis for Monte Carlo
Analysis of output from Monte Carlo algorithms, particularly (Markov chain Monte Carlo) MCMC, requires a specialized toolkit, that accurately accounts for the correlation structures in the process. Summaries of quality assessment must then, agree with the theoretical underpinnings of the output analysis tools. This is true for both numerical and visual summaries. The goal of this project is to complete the development of the R package SimTools
, which is envisioned to contain theoretically founded and practically useful output analysis tools for Monte Carlo simulations.
Many software resources are dedicated to the simulation of MCMC, like Stan, Nimble, JAGS, etc. They are equipped with standard output analysis techniques, designed for their specific processes. However, many users code either their own Markov chains or use other MCMC/Monte Carlo techniques, that are not compatible with these packages. Further, Stan, for example, inherently assumes reversibility of the Markov chain, and is thus not equipped to handle general MCMC algorithms. On the other hand, R packages like coda
and bayesplot
, serve a large audience of MCMC users, but are not up to speed with the latest theoretical developments in the area.
The unique feature of SimTools
is that it is a one-stop for all output analysis of Monte Carlo. Independently and identically distributed simulations are treated in the Siid
class, and MCMC simulations are treated in the Smcmc
class. Further, all analysis methods are underpinned by the latest theoretical developments in MCMC output analysis.
There are five major goals that need to be completed for SimTools before its ready for a CRAN submissions:
-
multiple chain compatibility: The current base software only has single process/chain compatibility. However, many simulation users run parallel chains, and thus the infrastructure needs to allow for taking in multiple simulations as input. This will require changing the current
plot.Smcmc
,plot.Siid
, andacf
functions as well. -
summary function for output: One of the main tasks will be to create a
summary
function for bothSmcmc
andSiid
classes. The summary functions are expected to contain mean estimates, quantiles, standard errors, effective sample sizes, and suggestions on accuracy. -
compatibility for discrete state space: Simulations run on discrete state spaces require a different visualization for their "density" plots. The current
plot.Siid
andplot.Smcmc
functions are unsuitable discrete data points. An alternative argument needs to be added to these, to redirect the visualizations. -
make efficient trace plots: standard trace plots visa
plot.ts
can be computationally demanding for large simulations sizes. An alternative it to present a subset of the trace plot by default. -
export to C++: The main tools of variance calculation are being done in the R package
mcmcse
. Much of this package is written inRcpp
. InSimTools
, there are a few functions like the density plot confidence interval calculation, that can benefit from being exported to C++ viaRcpp
. If time allows, this will be a goal of the project as well.
In addition to the above tasks, the following needs to be done throughout:
- Documentation: Rich documentation with proper examples, and references will be prepared by the contributor.
- Vignette: A vigenette for guidance on how to use the package will be made by the mentors and the contributor collaboratively.
- Tests: Given that fundamental knowledge of Statistical concepts is essential for this project, a thorough testing system will be setup for each stage of the project.
In both Statistics and Machine Learning, Markov chain Monte Carlo and other Monte Carlo techniques are omnipresent. The lack of a rigorous open-source package of summarization of the output implies lack of cohesiveness in the discussion around the validity of simulations. The package SimTools
is expected to fill this gaping software gap, and being directly helpful to scientists all over the world.
Contributors, please contact mentors below after completing at least one of the tests below.
- EVALUATING MENTOR: Dootika Vats [email protected] is the
author of R package
mcmcse
and has previously mentored two GSoC R projects. - James Flegal [email protected] is an expert in output analysis for MCMC and a previous GSoC mentor as well.
Contributors, please do one or more of the following tests before contacting the mentors above.
- Easy: Write an efficient simulation of an AR(1) model with Gaussian noise.
- Medium: Write an efficient implementation of a Metropolis-Hastings algorithm to sample from a
$p$ -variate Normal distribution with any user given mean vector and covariance matrix.
Contributors, please post a link to your test results here.
- EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Name - Siddharth Pathak
GitHub Profile - https://github.com/SiddharthanilPathak
Solutions of Tests - https://github.com/SiddharthanilPathak/GSoC-2023-Primary-Tasks-
- Name: Shlok Mishra
- GitHub: https://github.com/shlokmishra
- Code: https://github.com/shlokmishra/simTools-GSoC-Tasks