-
Notifications
You must be signed in to change notification settings - Fork 17
/
bayesian-beta-regression.Rmd
189 lines (135 loc) · 4.04 KB
/
bayesian-beta-regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# Bayesian Beta Regression
The following provides an example of beta regression using Stan/rstan, with comparison to results with R's <span class="pack" style = "">betareg</span> package.
## Data Setup
Several data sets from are available <span class="pack" style = "">betareg</span> to play with, but as they are a bit problematic in one way or another I instead focus on a simple simulated data set.
```{r bayes-beta-setup}
library(tidyverse)
library(betareg)
# Data for assessing the contribution of non-verbal IQ to children's reading
# skills in dyslexic and non-dyslexic children.
# issue: 30% of data has a value of .99
# data("ReadingSkills")
# ?ReadingSkills
# y = ReadingSkills$accuracy
#
# brmod = betareg(accuracy ~ dyslexia + iq, data = ReadingSkills)
# X = cbind(1, scale(model.matrix(brmod)[,c('dyslexia','iq')], scale=F))
# or this, issue: ignores batch effects
# data("GasolineYield")
# ?GasolineYield
#
# y = GasolineYield$yield
# X = cbind(1, scale(GasolineYield[,c('gravity','pressure','temp')]))
# yet another data option, issue: only two binary predictors
# data(WeatherTask)
# ?WeatherTask
#
# y = WeatherTask$agreement
# brmod = betareg(agreement ~ priming + eliciting, data = WeatherTask)
# X = model.matrix(brmod)
# simulated data; probably a better illustration, or at least better behaved one.
set.seed(1234)
N = 500 # Sample size
x_1 = rnorm(N) # Predictors
x_2 = rnorm(N)
X = cbind(1, x_1, x_2)
beta = c(-1, .2, -.3)
mu = plogis(X %*% beta) # add noise if desired + rnorm(N, sd=.01)
phi = 10
A = mu * phi
B = (1 - mu) * phi
y = rbeta(N, A, B)
d = data.frame(x_1, x_2, y)
qplot(y, geom='density')
```
## Model Code
```{stan bayes-beta-code, output.var='bayes_beta'}
data {
int<lower=1> N; // sample size
int<lower=1> K; // K predictors
vector<lower=0,upper=1>[N] y; // response
matrix[N,K] X; // predictor matrix
}
parameters {
vector[K] theta; // reg coefficients
real<lower=0> phi; // dispersion parameter
}
transformed parameters{
vector[K] beta;
beta = theta * 5; // same as beta ~ normal(0, 5); fairly diffuse
}
model {
// model calculations
vector[N] LP; // linear predictor
vector[N] mu; // transformed linear predictor
vector[N] A; // parameter for beta distn
vector[N] B; // parameter for beta distn
LP = X * beta;
for (i in 1:N) {
mu[i] = inv_logit(LP[i]);
}
A = mu * phi;
B = (1.0 - mu) * phi;
// priors
theta ~ normal(0, 1);
phi ~ cauchy(0, 5); // different options for phi
//phi ~ inv_gamma(.001, .001);
//phi ~ uniform(0, 500); // put upper on phi if using this
// likelihood
y ~ beta(A, B);
}
generated quantities {
vector[N] y_rep;
for (i in 1:N) {
real mu;
real A;
real B;
mu = inv_logit(X[i] * beta);
A = mu * phi;
B = (1.0 - mu) * phi;
y_rep[i] = beta_rng(A, B);
}
}
```
## Estimation
We create a data list for Stan and estimate the model.
```{r bayes-beta-est, results='hide'}
# Stan data list
stan_data = list(N = length(y),
K = ncol(X),
y = y,
X = X)
library(rstan)
fit = sampling(
bayes_beta,
data = stan_data,
thin = 4,
verbose = FALSE
)
# model for later comparison
brmod = betareg(y ~ ., data = d)
```
## Comparison
Estimates are almost idential in this particular case.
```{r bayes-beta-compare}
print(
fit,
pars = c('beta', 'phi'),
digits_summary = 3,
probs = c(.025, .5, .975)
)
summary(brmod)
```
## Visualization
Posterior predictive check.
```{r bayes-beta-ppcheck}
library(bayesplot)
pp_check(
stan_data$y,
rstan::extract(fit, par = 'y_rep')$y_rep[1:10, ],
fun = 'dens_overlay'
)
```
## Source
Original code available at:
https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/Bayesian/rstanBetaRegression.R