Skip to content
This repository has been archived by the owner on Jul 7, 2022. It is now read-only.

Commit

Permalink
Merge pull request #49 from ckoerber/feature-normal-approximation
Browse files Browse the repository at this point in the history
Performance boost through normal approximations
  • Loading branch information
cjbayesian authored Apr 27, 2020
2 parents 4bd0e2e + 8193550 commit 8364e69
Show file tree
Hide file tree
Showing 24 changed files with 3,502 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ __pycache__/
.DS_Store
output/*
!output/
*.egg-info
99 changes: 99 additions & 0 deletions README-normal-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Normal extension of Bayesian analysis

The folder `bayes_chime/normal` contains an extension of the Bayesian analysis of the SEIR model.
In this extension, it is assumed that all data and model parameters are normally distributed.
This simplifies the propagation of errors and allows to analytically compute approximated posterior distributions.
Particularly, propagating parameter distributions through a 200-day simulation takes `20ms` while the estimation of the posterior distribution is at the order of `1s`.
Tests indicated that, for a given parameter distribution, general Bayesian forecasts agree with their normal approximations within uncertainty.


## Install the module
You can locally install this module using pip

```bash
pip install [--user] [-e] .
```

## How to use the module

### How to propagate uncertainties
```python
from gvar import gvar
from bayes_chime.normal.models import SEIRModel

# define fixed model parameters
xx = {"market_share": 0.26, "region_pop": 1200000, ...}

# define model parameter distributions
pp = {"incubation_days": gvar(4.6, 1.5), "recovery_days": gvar(15.3, 3.5)}

# set up the model
seir = SEIRModel()
df = seir.propagate_uncertainties(xx, pp)
```
The `gvar` variable represents a Gaussian random number characterized by it's mean and standard deviation.

### How to compute posteriors
```python
... # after code above

from lsqfit import nonlinear_fit

yy = gvar(data_mean, data_sdev)

fit = nonlinear_fit(data=(xx, yy), prior=pp, fcn=seir.fit_fcn)

print(fit) # Fit statistics
print("Posterior:", fit.p)
```

In the `notebooks/How-to-use-normal-approximations-module.ipynb`, the general usage of the module is explained.

## Technical details

### Uncertainty propagation and posteriors

This module makes use of two libraries developed by [Peter Lepage](https://physics.cornell.edu/peter-lepage)

* [`gvar`](https://gvar.readthedocs.io) and
* [`lsqfit`](https://lsqfit.readthedocs.io)

The random numbers represented by `gvar`s act like `floats` but automatically compute analytic derivatives in successive computations.
Thus they allow to propagate errors through the whole computation.
Because normal distributions are self-conjugate (if the likelihood is a Gaussian and the prior is Gaussian, so will be the posterior).

To determine posterior distributions, the `nonlinear_fit` function from `lsqfit` uses the saddle-point approximation for the kernel of the posterior function (evolve the residuals up to the second order and evaluate at the point where the first derivative vanishes).
Thus, computing the posterior effectively boils down to finding the parameters which cause the first derivative of the `chi**2` to vanish.
For this it uses regular minimization routines.
More details are specified in the [appendix A of the linked publication](https://arxiv.org/pdf/1406.2279.pdf).

### Kernel functions

To utilize the above modules, kernel operations must be written to utilize the syntax of `gvar`s and `lsqfit`.
Model parameters are either categorized as independent / fixed parameters or distribution / variable parameters.
Variable parameters follow normal distributions, fixed parameters are numbers.

This module abstracts the compartment models like SIR or SEIR such that future implementations can easily be extended.
E.g., after inheriting from the `bayes_chime.normal.models.base.CompartmentModel`, one only has to provide a `simulation_step` method and can use existing API to run simulations
```python
def simulation_step(data, **pars):
susceptible = data["susceptible"]
exposed = data["exposed"]
infected = data["infected"]
recovered = data["recovered"]

infected += pars["beta"] * infected * susceptible / pars["total"]
...

return oupdated_data
```

```
## Cross checks
This module tests against the non-Bayesian `penn_chime` module. These tests can be run in the repo root with
```bash
pytest
```
Furthermore, the `How-to-use` notebook fits posterior distributions generated with the main module and compares the propagation of uncertainties.
Empty file added bayes_chime/__init__.py
Empty file.
Empty file added bayes_chime/normal/__init__.py
Empty file.
9 changes: 9 additions & 0 deletions bayes_chime/normal/doc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
notes.aux
notes.log
xetex_import.sty
notes.bbl
notes.blg
notes.out
notes.synctex.gz
notesNotes.bib
notes.pdf
12 changes: 12 additions & 0 deletions bayes_chime/normal/doc/notes.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@article{Bouchard:2014ypa,
author = "Bouchard, C.M. and Lepage, G. Peter and Monahan, Christopher and Na, Heechang and Shigemitsu, Junko",
archivePrefix = "arXiv",
doi = "10.1103/PhysRevD.90.054506",
eprint = "1406.2279",
journal = "Phys. Rev. D",
pages = "054506",
primaryClass = "hep-lat",
title = "{$B_s \to K \ell \nu$ form factors from lattice QCD}",
volume = "90",
year = "2014"
}
242 changes: 242 additions & 0 deletions bayes_chime/normal/doc/notes.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
\batchmode
\documentclass[paper=a4, fontsize=12pt, prl, notitlepage]{revtex4-1}
\usepackage[ssp]{xetex_import}
\linespread{1.4}
\errorstopmode

\begin{document}

\title{Math behind Bayesian uncertainty propagation using normal approximations}
\author{Christopher Körber}
\affiliation{%
Department of Physics,
University of California,
Berkeley, CA 94720, USA\\
Nuclear Science Division, %
LBNL,
Berkeley, CA 94720, USA
}
\date{\today}
\begin{abstract}
This document addresses the math used to propagate errors and to infer posterior distributions (Bayesian statistics) used in \href{https://github.com/pennsignals/chime_sims/pull/49}{the performance boost through normal approximations PR in the \texttt{pennsignals/chime\_sims} repo}.
\end{abstract}

\maketitle

%==============================================================================
% Content
%==============================================================================

\section{Assumptions}
The general theme of this document is: \textit{casting everything to normal distribution simplifies computations}.

In general, the techniques in PR \#49 make two approximations
\begin{enumerate}
\item All model and data \textit{input} distributions can be sufficiently approximated with multivariate normal distributions
\item The posterior distributions can be sufficiently approximated by a multivariate normal distribution
\end{enumerate}

\section{Error propagation}
The \texttt{gvar} module is used to analytically propagate errors.
The eqution used to computed the standard deviation $\sigma_f$ of a function $f(\vec p)$ at a given point in the parameter space $\bar {\vec p}$ is
\begin{equation}
\sigma_f^2(\bar {\vec p})
=
\left[
\sum_{i,j=1}^N
\frac{\partial f(\vec p)}{\partial p_i}
\left(\Sigma_p\right)_{ij}
\frac{\partial f(\vec p)}{\partial p_j}
\right]_{\vec p = \bar {\vec p}}
\, ,
\end{equation}
where $\Sigma_p$ is the covariance matrix of the normally distributed parameters $\vec p$.
In the uncorrelated case, this simplifies to
\begin{equation}
\sigma_f^2(\bar {\vec p})
=
\left[
\sum_{i=1}^N
\left(\frac{\partial f(\vec p)}{\partial p_i}\right)^2
\sigma_{p_i}^2
\right]_{\vec p = \bar {\vec p}}
\, .
\end{equation}

The module \texttt{gvar} is capable of tracing such derivatives since implemented functions are aware of their analytical derivatives.

\subsubsection{Example}
For example, suppose
\begin{align}
f(x, y)
&=
x y^2
&& \Rightarrow &
\sigma_f^2(\bar{\vec p})
&=
y^4 (\Sigma_p)_{xx}
+ 2 x y^3 \left[ (\Sigma_p)_{xy} + (\Sigma_p)_{yx} \right]
+ 4 x^2 y^2 (\Sigma_p)_{yy}
\end{align}
Thus, if $x, y$ follow a multivariate normal distribution of mean $\bar{\vec p}$ and covariance $\Sigma_p$, one finds
\begin{align}
\bar{\vec p}
&=
\begin{pmatrix}
1 \\ -2
\end{pmatrix}
\, , &
\Sigma_p
&=
\begin{pmatrix}
2 & 1 \\ 1 & 1
\end{pmatrix}
&
\Rightarrow
\sigma_f(\bar{\vec p})
&=
\sqrt{32 - 32 + 16} = 4
\end{align}
The corresponding \texttt{gvar} code returns
\begin{lstlisting}[style=python]
from gvar import gvar

mean = [1, -2]
cov = [[2, 1], [1, 1]]
x, y = gvar(mean, cov)
f = x * y ** 2
print(f)

> 4.0(4.0)
\end{lstlisting}


\section{Computation of the posterior}

This section explains how the \texttt{lsqfit} module approximates the posterior distribution $P(\vec p|D,M)$ given data $D$, an input model $M$ and it's corresponding priors $P(\vec p| M)$.

\subsection{Defintions}

The posterior distribution $P(\vec p|D, M)$ is proportional to the prior times the probability of the data given the model and parameters $P(D|\vec p, M)$ (the likelihood)
\begin{align}
P(\vec p|D, M) &=
\frac{P(D|\vec p, M)P(\vec p | M)}{P(D|M)}
\propto
P(D|\vec p, M)P(\vec p | M)
\, .
\end{align}
The marginal likelihood of the data $D$ given the model $M$ is obtained by integrating over the whole parameter space
\begin{equation}
P(D|M)
=
\int d \vec p P(D|\vec p, M)P(\vec p | M) \, .
\end{equation}
Because the posterior is normalized by the ratio of both distributions, one can neglect constant factors in the computation.

The likelihood of the data given the model and parameters is described by a $\chi^2$ distribution
\begin{equation}
P(D|\vec p, M)
\sim
\exp\left\{
- \frac{1}{2}
\sum_{i,j=1}^N
\left[y_i - \vec f_M(x_i, \vec p)\right]
\left(\Sigma_y^{-1}\right)_{ij}
\left[y_j - \vec f_M(x_j, \vec p)\right]
\right\}
=
\exp\left\{
- \frac{1}{2}
\chi^2_D(\vec p)
\right\}
\, ,
\end{equation}
where $\Sigma_y$ is the covariance matrix of the data and $f_M(x_j, \vec p)$ the model function evaluated at point $x_j$ which aims to describe the data point $y_j$.

Maximizing the Likelihood as a function of $\vec p$ corresponds to minimizing the exponent--which is the standard $\chi^2$-minimization procedure.
Computing the posterior distribution function for a given prior $P(\vec p| M)$ is called Bayesian inference.
Normal approximations of the posterior distribution are somewhere in the middle of a full Bayesian treatment and regular $\chi^2$-minimization.

\subsection{Normal approximation of the posterior}
Including the multivariate normal prior distribution with mean $\vec p_0$ and covariance $\Sigma_{p_0}$, the posterior distribution is proportional to
\begin{align}
P(\vec p|D, M)
&\sim
\exp\left\{
- \frac{1}{2}
\chi^2_D(\vec p)
- \frac{1}{2}
\chi^2_M(\vec p)
\right\}
=
\exp\left\{
- \frac{1}{2}
\chi^2_{DM}(\vec p)
\right\}
\, , &
\chi^2_M(\vec p)
&=
\left[\vec p - \vec p_0\right]^T \cdot
\Sigma_{p_0}^{-1}
\left[\vec p - \vec p_0\right]\, .
\end{align}
In short, the \texttt{lsqfit} module approximates the posterior by expressing the function $\chi^2_{DM}(\vec p)$ up to second order in $\vec p$ at the point $\bar{\vec p}$ where the first derivative vanishes (stationary or almost always minimal point)
\begin{align}
\chi^2_{DM}(\vec p)
& \approx
\chi^2_{DM}(\bar{\vec p})
+
\left[\vec p - \bar{\vec p}\right]^T
\Sigma_{DM}^{-1}(\bar{\vec p})
\left[\vec p - \bar{\vec p}\right]^T
\, , & &
\left.\frac{\partial \chi^2_{DM}(\vec p)}{\partial p_\alpha}\right|_{\vec p = \bar{\vec p}} = 0 \, \quad \forall_\alpha \, .
\end{align}
In this approximation, the posterior is again a multivariate normal distribution of mean $\bar{\vec p}$ (same as maximal likelihood estimation) with covariance $\Sigma_{DM}(\bar{\vec p})$.
The \texttt{nonlinear\_fit} method numerically computes the vector which minimizes the posterior $\bar{\vec p}$ (fitting) and analytically computes and evaluates the covariance matrix $\Sigma_{DM}(\bar{\vec p})$ at this point.
The appendix A of \cite{Bouchard:2014ypa} describes how $\Sigma_{DM}(\bar{\vec p})$ is estimated using derivatives of the residuals with respect of prior parameters.
%
% \subsubsection{Example}
%
% Suppose the fit function is a linear function $f(x_i, \vec p) = p^{(0)} + p^{(1)} x_i$.
% In this case, the normal approximation of the posterior is exact
% \begin{align}
% \chi^2_{DM}(\vec p)
% & =
% \sum_{i,j=1}^N
% \left[y_i - p^{(0)} - p^{(1)} x_i \right]
% \left(\Sigma_y^{-1}\right)_{ij}
% \left[y_j - p^{(0)} - p^{(1)} x_j\right]
% +
% \left[\begin{pmatrix} p^{(0)} \\ p^{(1)} \end{pmatrix} - \vec p_0 \right]^T
% \left(\Sigma_p^{-1}\right)_{ij}
% \left[\begin{pmatrix} p^{(0)} \\ p^{(1)} \end{pmatrix} - \vec p_0 \right]
% \\
% & =
% \left[
% \sum_{i,j=1}^N y_i \left(\Sigma_y^{-1}\right)_{ij} y_j
% + \vec p_0^T
% \left(\Sigma_p^{-1}\right)
% \vec p_0
% \right]
% +
% \left[\begin{pmatrix} p^{(0)} \\ p^{(1)} \end{pmatrix} - \bar{\vec p} \right]
% \left(\Sigma_{DM}^{-1}\right)_{ij}
% \left[\begin{pmatrix} p^{(0)} \\ p^{(1)} \end{pmatrix} - \bar{\vec p} \right]^T
% \, ,
% \end{align}
% with
% \begin{align}
% \bar{\vec p} &= \ldots
% \\
% \Sigma_{DM}^{-1} &= \ldots
% \end{align}

%==============================================================================
% End Content
%==============================================================================
\bibliography{notes.bib}{}
\bibliographystyle{plainurl}

\batchmode
\end{document}
Loading

0 comments on commit 8364e69

Please sign in to comment.