cubic-spline.Rmd

# Cubic Spline Model


See Wood (2017) Generalized Additive Models or my [document](https://m-clark.github.io/generalized-additive-models/) for an introduction to generalized additive models.  


## Data Setup

The data regards engine wear index versus engine capacity for 19 Volvo car engines used.  The idea is that a larger car engine will wear out less quickly than a smaller one (from Wood GAM 2e chapter 4). 

```{r cs-setup}
library(tidyverse)

data(engine, package = 'gamair')

size = engine$size
wear = engine$wear

x = size - min(size)
x = x / max(x)
d = data.frame(wear, x)
```

## Functions

Cubic spline function, `rk` refers to [reproducing kernel][Reproducing Kernel Hilbert Space Regression]. If I recall correctly, the function code is actually based on the first edition of Wood's text.

```{r cs-func}
rk <- function(x, z) {
  ((z - 0.5)^2 - 1/12) * ((x - 0.5)^2 - 1/12)/4 -
    ((abs(x - z) - 0.5)^4 - (abs(x - z) - 0.5)^2 / 2 + 7/240) / 24
}
```

Generate the model matrix.

```{r cs-modmat}
splX <- function(x, knots) {
  q = length(knots) + 2                # number of parameters
  n = length(x)                        # number of observations
  X = matrix(1, n, q)                  # initialized model matrix
  X[ ,2]   = x                         # set second column to x
  X[ ,3:q] = outer(x, knots, FUN = rk) # remaining to cubic spline basis
  X
}

splS <- function(knots) {
  q = length(knots) + 2
  S = matrix(0, q, q)                         # initialize matrix
  S[3:q, 3:q] = outer(knots, knots, FUN = rk) # fill in non-zero part
  S
}
```


Matrix square root function. Note that there are various packages with their own.

```{r cs-mat-sqrt}
mat_sqrt <- function(S) {
  d  = eigen(S, symmetric = TRUE)
  rS = d$vectors %*% diag(d$values^.5) %*% t(d$vectors)
  rS
}
```

Penalized fitting function.

```{r cs-fit}
prs_fit <- function(y, x, knots, lambda) {
  q  = length(knots) + 2    # dimension of basis
  n  = length(x)            # number of observations
  Xa = rbind(splX(x, knots), mat_sqrt(splS(knots))*sqrt(lambda)) # augmented model matrix
  y[(n + 1):(n+q)] = 0      # augment the data vector
  
  lm(y ~ Xa - 1) # fit and return penalized regression spline
}
```


## Example 1


We start with an unpenalized approach.

```{r cs-unpenal}
knots = 1:4/5
X = splX(x, knots)            # generate model matrix

fit_lm  = lm(wear ~ X - 1)    # fit model

xp = 0:100/100                # x values for prediction
Xp = splX(xp, knots)          # prediction matrix
```

Visualize.

```{r cs-unpenal-vis}
ggplot(aes(x = x, y = wear), data = data.frame(x, wear)) +
  geom_point(color = "#FF5500") +
  geom_line(aes(x = xp, y = Xp %*% coef(fit_lm)),
            data = data.frame(xp, Xp),
            color = "#00AAFF") +
  labs(x = 'Scaled Engine size', y  = 'Wear Index')
```


## Example 2

Now we add the `lambda` penalty and compare fits at different values of `lambda`.

```{r cs-penal}
knots = 1:7/8
d2 = data.frame(x = xp)
lambda = c(.1, .01, .001, .0001, .00001, .000001)
rmse   = vector('numeric', length(lambda))
idx = 0

for (i in lambda) {
  # fit penalized regression
  fit_penalized = prs_fit(
    y = wear,
    x = x,
    knots = knots,
    lambda = i
  ) 
  # spline choosing lambda
  Xp = splX(xp, knots) # matrix to map parameters to fitted values at xp
  LP = Xp %*% coef(fit_penalized)
  d2[, paste0('lambda = ', i)] = LP[, 1]
  
  r = resid(fit_penalized)
  idx = 1 + idx
  
  rmse[idx] = sqrt(mean(r^2))
}
```


Visualize. I add the root mean square error for model comparison.

```{r cs-penal-vis}
d3 = d2 %>%
  pivot_longer(cols = -x,
               names_to  = 'lambda',
               values_to = 'value') %>% 
  mutate(lambda = fct_inorder(lambda),
         rmse   = round(rmse[lambda], 3))
```


```{r cs-penal-vis-show, echo=FALSE}
ggplot(d3) +
  geom_point(aes(x = x, y = wear), col = '#FF5500', data = d) +
  geom_line(aes(x = x, y = value), col = "#00AAFF") +
  geom_text(
    aes(label = glue::glue('rmse: {rmse}')),
    x  = .33,
    y = 4.5,
    size = 2,
    color = 'gray50'
  ) +
  facet_wrap( ~ lambda)
```


## Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/cubicsplines.R