newton-irls.Rmd

# Newton and IRLS


Here we demonstrate Newton's and Iterated Reweighted Least Squares approaches
with a logistic regression model.

For the following, I had Murphy's PML text open and more or less followed the
algorithms in chapter 8.  Note that for Newton's method, this doesn't implement
a line search to find a more optimal stepsize at a given iteration.

## Data Setup

Predict graduate school admission based on gre, gpa, and school rank (higher is
more prestige). See corresponding demo
[here](https://stats.idre.ucla.edu/stata/dae/logistic-regression/). The only
difference is that I treat rank as numeric rather than categorical.  We will be
comparing results to base R <span class="func" style = "">glm</span> function,
so I will use it to create the model data.


```{r newton-setup}
library(tidyverse)

admit = haven::read_dta('https://stats.idre.ucla.edu/stat/stata/dae/binary.dta')

fit_glm = glm(admit ~ gre + gpa + rank, data = admit, family = binomial)

# summary(fit_glm)

X = model.matrix(fit_glm)
y = fit_glm$y
```


## Functions

### Newton's Method

```{r newton}
newton <- function(
  X,
  y,
  tol  = 1e-12,
  iter = 500,
  stepsize = .5
  ) {
  
  # Args: 
  # X: model matrix
  # y: target
  # tol: tolerance
  # iter: maximum number of iterations
  # stepsize: (0, 1)
  
  # intialize
  int     = log(mean(y) / (1 - mean(y)))         # intercept
  beta    = c(int, rep(0, ncol(X) - 1))
  currtol = 1
  it = 0
  ll = 0
  
  while (currtol > tol && it < iter) {
    it = it +1
    ll_old = ll
    
    mu = plogis(X %*% beta)[,1]
    g  = crossprod(X, mu-y)               # gradient
    S  = diag(mu*(1-mu)) 
    H  = t(X) %*% S %*% X                 # hessian
    beta = beta - stepsize * solve(H) %*% g

    ll = sum(dbinom(y, prob = mu, size = 1, log = TRUE))
    currtol = abs(ll - ll_old)
  }
  
  list(
    beta = beta,
    iter = it,
    tol  = currtol,
    loglik = ll
  )
}
```

### IRLS 

Note that <span class="func" style = "">glm</span> is actually using IRLS, so
the results from this should be fairly spot on.

```{r irls}
irls <- function(X, y, tol = 1e-12, iter = 500) {
  
  # intialize
  int  = log(mean(y) / (1 - mean(y)))   # intercept
  beta = c(int, rep(0, ncol(X) - 1))
  currtol = 1
  it = 0
  ll = 0
  
  while (currtol > tol && it < iter) {
    it = it + 1
    ll_old = ll
    
    eta  = X %*% beta
    mu   = plogis(eta)[,1]
    s    = mu * (1 - mu)
    S    = diag(s)
    z    = eta + (y - mu)/s
    beta = solve(t(X) %*% S %*% X) %*% (t(X) %*% (S %*% z))
    
    ll = sum(
      dbinom(
        y,
        prob = plogis(X %*% beta),
        size = 1,
        log  = TRUE
      )
    )
    
    currtol = abs(ll - ll_old)
  }
  
  list(
    beta = beta,
    iter = it,
    tol  = currtol,
    loglik  = ll,
    weights = plogis(X %*% beta) * (1 - plogis(X %*% beta))
  )
}
```


## Estimation

```{r newton-est}
fit_newton = newton(
  X = X,
  y = y,
  stepsize = .9,
  tol = 1e-8      # tol set to 1e-8 as in glm default
) 

fit_newton
# fit_glm
```


`tol` set to 1e-8 as in <span class="func" style = "">glm</span> default.

```{r irls-est}
irls_result = irls(X = X, y = y, tol = 1e-8) 

str(irls_result)
# fit_glm
```
 

## Comparison


Compare all results.

```{r irls-compare, echo=FALSE}
rbind(
  newton = unlist(fit_newton),
  irls   = unlist(irls_result[-length(irls_result)]),
  glm_default = c(
    beta = coef(fit_glm),
    fit_glm$iter,
    tol = NA,
    loglik = logLik(fit_glm)
  )
) %>% 
  kable_df()
```

Compare weights between the <span class="func" style = "">irls</span> and <span class="func" style = "">glm</span> results.

```{r irls-compare-weights}
head(cbind(irls_result$weights, fit_glm$weights))
```


## Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/newton_irls.R