em-mixture-mv.Rmd

## Multivariate Mixture Model

The following code is based on algorithms noted in Murphy, 2012 Probabilistic Machine Learning.
Specifically, Chapter 11, section 4.


### Function

This estimating function will be used for both examples.

```{r em_mixture-mv}
em_mixture <- function(
  params,
  X,
  clusters = 2,
  tol = .00001,
  maxits   = 100,
  showits  = TRUE
  ) {
  
  # Arguments are 
  # params: starting parameters (means, covariances, cluster probability)
  # X: data 
  # clusters: number of clusters desired
  # tol: tolerance
  # maxits: maximum iterations
  # showits: whether to show iterations
  
  require(mvtnorm)
  # Starting points
  N     = nrow(X)
  mu    = params$mu
  var   = params$var
  probs = params$probs
  
  # initializations
  
  # cluster 'responsibilities', i.e. probability of cluster membership for each
  # observation i
  ri = matrix(0, ncol=clusters, nrow=N)       
  ll = 0                                        # log likelihood
  it = 0                                        # iteration count
  converged = FALSE                             # convergence
  
  # Show iterations if showits == true
  if (showits)                                  
    cat(paste("Iterations of EM:", "\n"))
  
  while (!converged & it < maxits) { 
    probsOld = probs
    # muOld = mu                # Use direct values or loglike for convergence check
    # varOld = var
    llOld = ll
    riOld = ri
    
    ### E
    # Compute responsibilities
    for (k in 1:clusters){
      ri[,k] = probs[k] * dmvnorm(X, mu[k, ], sigma = var[[k]], log = FALSE)
    }
    
    ri = ri/rowSums(ri)
    
    ### M
    rk = colSums(ri)            # rk is weighted average cluster membership size
    probs = rk/N
    
    for (k in 1:clusters){
      varmat = matrix(0, ncol = ncol(X), nrow = ncol(X))    # initialize to sum matrices
      
      for (i in 1:N){
        varmat = varmat + ri[i,k] * X[i,]%*%t(X[i,])
      }
      
      mu[k,]   = (t(X) %*% ri[,k]) / rk[k]
      var[[k]] =  varmat/rk[k] - mu[k,]%*%t(mu[k,])
      
      ll[k] = -.5*sum( ri[,k] * dmvnorm(X, mu[k,], sigma = var[[k]], log = TRUE) )
    }
    
    ll = sum(ll)
    
    # compare old to current for convergence
    parmlistold =  c(llOld, probsOld)           # c(muOld, unlist(varOld), probsOld)
    parmlistcurrent = c(ll, probs)              # c(mu, unlist(var), probs)
    it = it + 1
    
    # if showits true, & it =1 or modulo of 5 print message
    if (showits & it == 1 | it%%5 == 0)         
      cat(paste(format(it), "...", "\n", sep = ""))
    
    converged = min(abs(parmlistold - parmlistcurrent)) <= tol
  }
  
  clust = which(round(ri) == 1, arr.ind = TRUE)        # create cluster membership
  clust = clust[order(clust[,1]), 2]            # order accoring to row rather than cluster
  
  
  list(
    probs   = probs,
    mu      = mu,
    var     = var,
    resp    = ri,
    cluster = clust,
    ll      = ll
  )
} 
```


### Example 1: Old Faithful


This example uses Old Faithful geyser eruptions as before.  This is can be compared to the univariate code from the [other chapter][Mixture Model]. See also http://www.geyserstudy.org/geyser.aspx?pGeyserNo=OLDFAITHFUL

#### Data Setup

```{r mixmv-setup1}
library(tidyverse)

data("faithful")
```


#### Estimation

Create starting values and estimate.

```{r mixmv-starts1}
mustart  = rbind(c(3, 60), c(3, 60.1))    # must be at least slightly different
covstart = list(cov(faithful), cov(faithful))
probs    = c(.01, .99)

# params is a list of mu, var, and probs 
starts = list(mu = mustart, var = covstart, probs = probs)  
```


```{r mixmv-est1}
mix_faithful = em_mixture(
  params = starts,
  X = as.matrix(faithful),
  clusters = 2,
  tol      = 1e-12,
  maxits   = 1500,
  showits  = TRUE
)

str(mix_faithful)
```


Visualize.

```{r mixmv-vis1, echo=FALSE}
library(ggplot2)

faithful %>%
  mutate(cluster = factor(mix_faithful$cluster)) %>% 
  ggplot(aes(x = eruptions, y = waiting) ) +
  geom_density2d(color = 'gray92') +
  geom_point(aes(color = cluster)) +
  scico::scale_color_scico_d(begin = .25, end = .75, alpha = .5) +
  guides(color = guide_legend('Cluster'))


faithful %>%
  mutate(prob_clus_1 = mix_faithful$resp[, 1]) %>%
  ggplot(aes(x = eruptions, y = waiting)) +
  geom_density2d(color = 'gray92') +
  geom_point(aes(color = prob_clus_1)) +
  scico::scale_color_scico(palette = 'bilbao', begin = .25) +
  guides(color = guide_legend('Prob. Cluster 1'))

# relatively speaking, these are extremely well-separated clusters
worst = apply(mix_faithful$resp, 1, function(x)  max(x) < .99) 

ggplot(aes(x = eruptions, y = waiting), data = faithful) +
  geom_point(aes(color = worst)) +
  scico::scale_color_scico_d(palette = 'bilbao', begin = .25) 
```

#### Comparison

Compare to <span class="pack" style = "">mclust</span> results. Options are set to be more similar to the settings demonstrated.

```{r mixmv-compare1} 
library(mclust)

mix_mclust = mclust::Mclust(
  faithful[, 1:2],
  2,
  modelNames = 'VVV',
  control = emControl(tol = 1e-12)
)

detach(package:mclust)

# str(mix_mclust, 1)
```

Compare means.

```{r mixmv-compare1-means} 
t(mix_faithful$mu)
mix_mclust$parameters$mean
```

Compare variances.

```{r mixmv-compare1-vars} 
mix_faithful$var
mix_mclust$parameters$variance$sigma
```


Compare classifications. Reverse in case arbitrary labeling of one of the clusters is opposite.

```{r mixmv-compare1-class} 
table(mix_faithful$cluster, mix_mclust$classification)

table(ifelse(mix_faithful$cluster == 2, 1, 2),
      mix_mclust$classification)

# compare responsibilities; reverse one if arbitrary numbering of one of them is opposite
# cbind(round(mix_faithful$resp[,1], 2), round(mix_mclust$z[,2], 2)) # cluster '1'
# cbind(round(mix_faithful$resp[,2], 2), round(mix_mclust$z[,1], 2)) # cluster '2'
```


### Example 2: Iris

#### Data Setup

Set up the data.

```{r mixmv-setup2}
iris2 = iris %>% select(-Species)
```


#### Estimation

Run and examine. We add noise to our starting value, and the function is notably sensitive to starts, but we don't want to cheat too badly.


```{r mixmv-starts2}
mustart = iris %>% 
  group_by(Species) %>% 
  summarise(across(.fns = function(x) mean(x) + runif(1, 0, .5))) %>% 
  select(-Species) %>% 
  as.matrix()


# use purrr::map due to mclust::map masking
covstart = iris %>% 
  split(.$Species) %>% 
  purrr::map(select, -Species) %>% 
  purrr::map(function(x) cov(x) + diag(runif(4, 0, .5))) 

probs = c(.1, .2, .7)

starts = list(mu = mustart, var = covstart, probs = probs)
```


```{r mixmv-est2}
mix_mclust_iris = em_mixture(
  params = starts,
  X = as.matrix(iris2),
  clusters = 3,
  tol      = 1e-8,
  maxits   = 1500,
  showits  = T
)

table(mix_mclust_iris$cluster, iris$Species)
```


#### Comparison

Compare to <span class="pack" style = "">mclust</span> results.

```{r mixmv-compare2}
library(mclust)

mclust_iris = mclust::Mclust(iris[,1:4], 3)
table(mclust_iris$classification, iris$Species)

detach(package:mclust)
```


### Source

Original code available at
https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/EM%20Examples/EM%20Mixture%20MV.R