Skip to content

SimTools: Output Analysis for Monte Carlo

Swapnil sharma edited this page Mar 10, 2023 · 15 revisions

Background

Analysis of output from Monte Carlo algorithms, particularly (Markov chain Monte Carlo) MCMC, requires a specialized toolkit, that accurately accounts for the correlation structures in the process. Summaries of quality assessment must then, agree with the theoretical underpinnings of the output analysis tools. This is true for both numerical and visual summaries. The goal of this project is to complete the development of the R package SimTools, which is envisioned to contain theoretically founded and practically useful output analysis tools for Monte Carlo simulations.

Related work

Many software resources are dedicated to the simulation of MCMC, like Stan, Nimble, JAGS, etc. They are equipped with standard output analysis techniques, designed for their specific processes. However, many users code either their own Markov chains or use other MCMC/Monte Carlo techniques, that are not compatible with these packages. Further, Stan, for example, inherently assumes reversibility of the Markov chain, and is thus not equipped to handle general MCMC algorithms. On the other hand, R packages like coda and bayesplot, serve a large audience of MCMC users, but are not up to speed with the latest theoretical developments in the area.

The unique feature of SimTools is that it is a one-stop for all output analysis of Monte Carlo. Independently and identically distributed simulations are treated in the Siid class, and MCMC simulations are treated in the Smcmc class. Further, all analysis methods are underpinned by the latest theoretical developments in MCMC output analysis.

Details of your coding project

There are five major goals that need to be completed for SimTools before its ready for a CRAN submissions:

  • multiple chain compatibility: The current base software only has single process/chain compatibility. However, many simulation users run parallel chains, and thus the infrastructure needs to allow for taking in multiple simulations as input. This will require changing the current plot.Smcmc, plot.Siid, and acf functions as well.

  • summary function for output: One of the main tasks will be to create a summary function for both Smcmc and Siid classes. The summary functions are expected to contain mean estimates, quantiles, standard errors, effective sample sizes, and suggestions on accuracy.

  • compatibility for discrete state space: Simulations run on discrete state spaces require a different visualization for their "density" plots. The current plot.Siid and plot.Smcmc functions are unsuitable discrete data points. An alternative argument needs to be added to these, to redirect the visualizations.

  • make efficient trace plots: standard trace plots visa plot.ts can be computationally demanding for large simulations sizes. An alternative it to present a subset of the trace plot by default.

  • export to C++: The main tools of variance calculation are being done in the R package mcmcse. Much of this package is written in Rcpp. In SimTools, there are a few functions like the density plot confidence interval calculation, that can benefit from being exported to C++ via Rcpp. If time allows, this will be a goal of the project as well.

In addition to the above tasks, the following needs to be done throughout:

  • Documentation: Rich documentation with proper examples, and references will be prepared by the contributor.
  • Vignette: A vigenette for guidance on how to use the package will be made by the mentors and the contributor collaboratively.
  • Tests: Given that fundamental knowledge of Statistical concepts is essential for this project, a thorough testing system will be setup for each stage of the project.

Expected impact

In both Statistics and Machine Learning, Markov chain Monte Carlo and other Monte Carlo techniques are omnipresent. The lack of a rigorous open-source package of summarization of the output implies lack of cohesiveness in the discussion around the validity of simulations. The package SimTools is expected to fill this gaping software gap, and being directly helpful to scientists all over the world.

Mentors

Contributors, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Dootika Vats [email protected] is the author of R package mcmcse and has previously mentored two GSoC R projects.
  • James Flegal [email protected] is an expert in output analysis for MCMC and a previous GSoC mentor as well.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Write an efficient simulation of an AR(1) model with Gaussian noise.
  • Medium: Write an efficient implementation of a Metropolis-Hastings algorithm to sample from a $p$-variate Normal distribution with any user given mean vector and covariance matrix.

Solutions of tests

Contributors, please post a link to your test results here.

  • EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.

Name - Siddharth Pathak
GitHub Profile - https://github.com/SiddharthanilPathak
Solutions of Tests - https://github.com/SiddharthanilPathak/GSoC-2023-Primary-Tasks-