1. Introduction
2. Installation
3. Additional Ressources
4. References
5. Citation
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf [Ultsch 2005]. PDE has been designed in particular to identify groups/classes in a dataset. The expectation maximization algorithm estimates a Gaussian mixture model of density states [Bishop 2006] and the limits between the different states are defined by Bayes decision boundaries [Duda 2001]. The model can be verified with Chi-squared test, Kolmogorov-Smirnov test and QQ plot.
The AdaptGauss package offers an interactive approach to the adaptation of Gaussian Mixture Models (GMM) and includes
- Interactive adaptation of Gaussian Mixture Models: Fitting GMM using the EM algorithm, with the possibility of the interactive adaption of the GMM with Shiny.
- Evaluation of Gaussian Mixture Models: Validate GMMs statistically and visually through statistical Tests and QQ-Plots.
- Bayes classification: Classify data according to Bayes boundaries.
Examples in which using only the EM algorithm for the GMM itself is insufficient, but a visual modelling approach is appropriate can be found in [Ultsch 2015].
Interactive adaption of a GMM, with shiny:
data = c(rnorm(3000,2,1),rnorm(3000,7,3),rnorm(3000,-2,0.5))
gmm = AdaptGauss::AdaptGauss(data, Means = c(-2, 2, 7), SDs = c(0.5, 1, 4),
- Selection of Gaussian component
- Visualization of Model probability density function (pdf), pdf of each component of the GMM, and the data density estimation
- Settings to execute an Expectation-Maximization algorithm
- Adjust the parameters of the current Gaussian component (with sliders or direct numeric input)
- Control the weights of the components (normalize all equally or only the others (excluding the current one) with respect to the weight of the current component)
- Control for visualization (show the Bayesian Boundaries or the components pdf)
- Control of the current setting: Restoring last best overall value (based on the 'Root Mean Squared Error'
- Control of the current setting: Create a plot or execute a chi square analysis for evaluation of the setting
The GMM can than be checked for statistical significance, for example using a version of the Chi-Square-test.
AdaptGauss::Chi2testMixtures(data, gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T)
AdaptGauss::QQplotGMM(data,gmm$Means,gmm$SDs,gmm$Weights)
Install automatically with all dependencies via
install.packages("AdaptGauss",dependencies = T)
Please note, that dependecies have to be installed manually.
remotes::install_github("Mthrun/AdaptGauss")
Please note, that dependecies have to be installed manually.
Tools -> Install Packages -> Repository (CRAN) -> AdaptGauss
- For further examples see Vignette
- Package Documentation
- View package on CRAN
[Ultsch 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
[Bishop 2006] Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006, p 435 ff
[Duda 2001] Duda, R.O., P.E. Hart, and D.G. Stork, Pattern classification. 2nd. Edition. New York, 2001, p 512 ff
[Ultsch 2015] Ultsch, A., Thrun, M. C., Hansen-Goos, O., & Lotsch, J. : Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, 2015.
Please use the following citation:
Thrun, M. C., & Ultsch, A. : Models of Income Distributions for Knowledge Discovery, Proc. European Conference on Data Analysis (ECDA), DOI: 10.13140/RG.2.1.4463.0244, pp. 136-137, Colchester, 2015.
Ultsch, A., Thrun, M. C., Hansen-Goos, O., & Lotsch, J. : Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, 2015.