Skip to content

Commit 1bc9168

Browse files
committed
Address reviewer comments
1 parent 2c37352 commit 1bc9168

File tree

2 files changed

+19
-18
lines changed

2 files changed

+19
-18
lines changed

examples/variational_inference/bayesian_neural_network_advi.ipynb

+10-9
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
"cell_type": "markdown",
1313
"metadata": {},
1414
"source": [
15-
":::{post} Apr 25, 2022\n",
16-
":tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference\n",
15+
":::{post} May 30, 2022\n",
16+
":tags: neural networks, perceptron, variational inference, minibatch\n",
1717
":category: intermediate\n",
1818
":author: Thomas Wiecki, updated by Chris Fonnesbeck\n",
1919
":::"
@@ -28,7 +28,7 @@
2828
"**Probabilistic Programming**, **Deep Learning** and \"**Big Data**\" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.\n",
2929
"\n",
3030
"### Probabilistic Programming at scale\n",
31-
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan. \n",
31+
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan. \n",
3232
"\n",
3333
"Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).\n",
3434
"\n",
@@ -234,9 +234,9 @@
234234
"source": [
235235
"### Variational Inference: Scaling model complexity\n",
236236
"\n",
237-
"We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
237+
"We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
238238
"\n",
239-
"Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
239+
"Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
240240
]
241241
},
242242
{
@@ -355,13 +355,14 @@
355355
"cell_type": "markdown",
356356
"metadata": {},
357357
"source": [
358-
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
358+
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
359359
]
360360
},
361361
{
362362
"cell_type": "code",
363363
"execution_count": 9,
364364
"metadata": {
365+
"collapsed": true,
365366
"jupyter": {
366367
"outputs_hidden": true
367368
}
@@ -429,7 +430,7 @@
429430
"metadata": {},
430431
"outputs": [],
431432
"source": [
432-
"pred = ppc.posterior_predictive[\"out\"].squeeze().mean(axis=0) > 0.5"
433+
"pred = ppc.posterior_predictive[\"out\"].mean((\"chain\", \"draw\")) > 0.5"
433434
]
434435
},
435436
{
@@ -618,7 +619,7 @@
618619
"cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)\n",
619620
"fig, ax = plt.subplots(figsize=(16, 9))\n",
620621
"contour = ax.contourf(\n",
621-
" grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap\n",
622+
" grid[0], grid[1], y_pred.mean((\"chain\", \"draw\")).values.reshape(100, 100), cmap=cmap\n",
622623
")\n",
623624
"ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color=\"C0\")\n",
624625
"ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color=\"C1\")\n",
@@ -903,7 +904,7 @@
903904
"hash": "5429d053af7e221df99a6f00514f0d50433afea7fb367ba3ad570571d9163dca"
904905
},
905906
"kernelspec": {
906-
"display_name": "Python 3.9.10 ('pymc-dev-py39')",
907+
"display_name": "Python 3 (ipykernel)",
907908
"language": "python",
908909
"name": "python3"
909910
},

examples/variational_inference/bayesian_neural_network_advi.myst.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3.9.10 ('pymc-dev-py39')
8+
display_name: Python 3 (ipykernel)
99
language: python
1010
name: python3
1111
---
@@ -15,8 +15,8 @@ kernelspec:
1515

1616
+++
1717

18-
:::{post} Apr 25, 2022
19-
:tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference
18+
:::{post} May 30, 2022
19+
:tags: neural networks, perceptron, variational inference, minibatch
2020
:category: intermediate
2121
:author: Thomas Wiecki, updated by Chris Fonnesbeck
2222
:::
@@ -28,7 +28,7 @@ kernelspec:
2828
**Probabilistic Programming**, **Deep Learning** and "**Big Data**" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.
2929

3030
### Probabilistic Programming at scale
31-
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan.
31+
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan.
3232

3333
Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).
3434

@@ -172,9 +172,9 @@ That's not so bad. The `Normal` priors help regularize the weights. Usually we w
172172

173173
### Variational Inference: Scaling model complexity
174174

175-
We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
175+
We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
176176

177-
Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
177+
Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
178178

179179
```{code-cell} ipython3
180180
%%time
@@ -195,7 +195,7 @@ plt.xlabel("iteration");
195195
trace = approx.sample(draws=5000)
196196
```
197197

198-
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
198+
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
199199

200200
```{code-cell} ipython3
201201
---
@@ -211,7 +211,7 @@ with neural_network:
211211
We can average the predictions for each observation to estimate the underlying probability of class 1.
212212

213213
```{code-cell} ipython3
214-
pred = ppc.posterior_predictive["out"].squeeze().mean(axis=0) > 0.5
214+
pred = ppc.posterior_predictive["out"].mean(("chain", "draw")) > 0.5
215215
```
216216

217217
```{code-cell} ipython3
@@ -265,7 +265,7 @@ y_pred = ppc.posterior_predictive["out"]
265265
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)
266266
fig, ax = plt.subplots(figsize=(16, 9))
267267
contour = ax.contourf(
268-
grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap
268+
grid[0], grid[1], y_pred.mean(("chain", "draw")).values.reshape(100, 100), cmap=cmap
269269
)
270270
ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color="C0")
271271
ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color="C1")

0 commit comments

Comments
 (0)