An R package for Bayesian estimation of finite mixture distributions
The package uses heavily C++ code to enable high performance MCMC sampling. To simplify handling for users each distribution comes along with some support functions that create required objects and starting parameters. As a result a user can perform Bayesian parameter estimation in only a few lines. The following mixtures are available:
- Binomial,
- Exponential,
- Normal,
- Multivariate Normal,
- Poisson,
- Condiitonal Poisson
- Student-t, and
- Multivariate Student-t.
The methods used in this package are based on the major literature on the Bayesian estimation of finite mixture distributions, namely
Frühwirth-Schnatter, Sylvia (2006), "Finite Mixture and Markov Switching Models", Springer Series in Statistics (link).
The code of this package is related to the bayesf
package in Matlab
written by Sylvia
Frühwirth-Schnatter herself. Due to the C++ extensions in this package using Rcpp
the
R
implementation is almost 200x times faster than the Matlab
version. A single MCMC run
with 11000 iterations is usually performed within 2-3 seconds.
The package can be installed directly from GitHub by using the function install_github()
in the devtools
package. The package passed all checks from R CMD check on all major
platforms and hence, should be installable on MacOS, Windows, and Linux. Be sure that you
installed appropriate developer tools for your platform as a C++ compiler for the source
code is needed.
For MacOS the XCode Command Line Tools are needed. You should have installed these when
installing R
. See the MacOSX-FAQ
for more information on how to install source packages on MacOS.
For Windows the rtools
package is needed.
Follow the link and install this package, if you have not installed it, yet.
As a quick start choose a Poisson mixture with two components and perform MCMC sampling:
# Load the package.
library(finmix)
# Define the finite mixture model (two components).
f_model <- model(dist="poisson", K=2, par=list(lambda=c(.3, .7)))
# Simulate data from this mixture distribution.
f_data <- simulate(f_model)
# Define the hyperparameters for MCMC sampling.
f_mcmc <- mcmc()
# Define the prior distribution for MCMC sampling.
f_prior <- priordefine(f_data, f_model)
# Set up all parameters for MCMC sampling.
(f_data ~ f_model ~ f_mcmc) %=% mcmcstart(f_data, f_model, f_mcmc)
# Perform MCMC sampling.
f_output <- mixturemcmc(f_data, f_model, f_prior, f_mcmc)
f_output
To estimate the parameters you simply call the function mcmcestimate()
on the output from MCMC sampling:
f_estimates <- mcmcestimate(f_output)
f_estimates
The package comes with many auxiliary functions for plotting (e.g. plotTraces()
to plot MCMC traces from MCMC outputs), for subchaining (subseq()
), for
swapping elements (swapElements()
), or for relabeling components (mcmcpermute()
).
See the documentation for further reading.
This is a package worked on for years and still not fully implemented. As it is still maintained by a single author, please by patient with issues.