To check whether a observed association between a set of features X and and outcome Y is due to
X causing Y, or an unknown confounder Z, we compare two models in terms of minimum description length:
- The causal model: A Bayesian linear regression model,
- The confounded model: A Probabilistic PCA model.
The model that explains the data better in terms of minimum description length is likely the true model. Please see Section 4 in (Wachinger et al., MedIA, https://arxiv.org/abs/2002.05049) for the full details.
import numpy as np
from compare_models import compare_models
X = np.random.randn(100, 5)
beta = np.zeros(5)
beta[1:] = np.random.uniform(low=-1, high=1, size=4)
Y = X @ beta
result = compare_models(X, Y, DZ=1)
print(result)This will print a table that lists the log-likelihood of
the causal and confounded model for 10 repetitions.
The higher log-likelihood of the causal model, suggests
that X is indeed the cause of Y.
| iter | ll_causal | ll_confounded | bayes_factor |
|---|---|---|---|
| 1 | -479.039789 | -822.107921 | 9.830969e+148 |
| 2 | -479.040413 | -822.155958 | 1.030832e+149 |
| 3 | -479.041873 | -822.185778 | 1.060485e+149 |
| 4 | -479.038676 | -822.201896 | 1.081167e+149 |
| 5 | -479.040882 | -822.394069 | 1.307358e+149 |
| 6 | -479.040289 | -821.986750 | 8.704735e+148 |
| 7 | -479.042047 | -822.151386 | 1.024454e+149 |
| 8 | -479.041100 | -822.115588 | 9.893665e+148 |
| 9 | -479.041621 | -822.332420 | 1.228286e+149 |
| 10 | -479.039873 | -821.870923 | 7.755917e+148 |
For our experiments, we used R version 3.6.2 with the following packages:
| Package | Version |
|---|---|
| abind | 1.4-5 |
| askpass | 1.1 |
| assertthat | 0.2.1 |
| backports | 1.1.5 |
| base64enc | 0.1-3 |
| bayesplot | 1.7.1 |
| betareg | 3.1-3 |
| BH | 1.72.0-3 |
| bit | 1.1-15.2 |
| bit64 | 0.9-7.1 |
| bridgesampling | 1.0-0 |
| brms | 2.12.0 |
| Brobdingnag | 1.2-6 |
| callr | 3.4.2 |
| checkmate | 2.0.0 |
| cli | 2.0.2 |
| coda | 0.19-3 |
| colorspace | 1.4-1 |
| colourpicker | 1.0 |
| crayon | 1.3.4 |
| crosstalk | 1.0.0 |
| curl | 4.3 |
| desc | 1.2.0 |
| digest | 0.6.25 |
| dplyr | 0.8.4 |
| DT | 0.12 |
| dygraphs | 1.1.1.6 |
| ellipsis | 0.3.0 |
| evaluate | 0.14 |
| fansi | 0.4.1 |
| farver | 2.0.3 |
| fastmap | 1.0.1 |
| filehash | 2.4-2 |
| flexmix | 2.3-15 |
| Formula | 1.2-3 |
| future | 1.16.0 |
| ggplot2 | 3.2.1 |
| ggridges | 0.5.2 |
| globals | 0.12.5 |
| glue | 1.3.1 |
| gridExtra | 2.3 |
| gtable | 0.3.0 |
| gtools | 3.8.1 |
| hdf5r | 1.3.2 |
| htmltools | 0.4.0 |
| htmlwidgets | 1.5.1 |
| httpuv | 1.5.2 |
| igraph | 1.2.4.2 |
| inline | 0.3.15 |
| IRdisplay | 0.7.0 |
| IRkernel | 1.1 |
| jsonlite | 1.6.1 |
| labeling | 0.3 |
| later | 1.0.0 |
| lazyeval | 0.2.2 |
| lifecycle | 0.1.0 |
| listenv | 0.8.0 |
| lme4 | 1.1-21 |
| lmtest | 0.9-37 |
| loo | 2.2.0 |
| magrittr | 1.5 |
| markdown | 1.1 |
| matrixStats | 0.55.0 |
| mime | 0.9 |
| miniUI | 0.1.1.1 |
| minqa | 1.2.4 |
| modeltools | 0.2-22 |
| munsell | 0.5.0 |
| mvtnorm | 1.1-0 |
| nleqslv | 3.3.2 |
| nloptr | 1.2.1 |
| openssl | 1.4.1 |
| packrat | 0.5.0 |
| pbdZMQ | 0.3-3 |
| pillar | 1.4.3 |
| pkgbuild | 1.0.6 |
| pkgconfig | 2.0.3 |
| plogr | 0.2.0 |
| plyr | 1.8.5 |
| png | 0.1-7 |
| prettyunits | 1.1.1 |
| processx | 3.4.2 |
| promises | 1.1.0 |
| ps | 1.3.2 |
| purrr | 0.3.3 |
| R6 | 2.4.1 |
| RColorBrewer | 1.1-2 |
| Rcpp | 1.0.3 |
| RcppEigen | 0.3.3.7.0 |
| RcppParallel | 4.4.4 |
| repr | 1.1.0 |
| reshape2 | 1.4.3 |
| rlang | 0.4.5 |
| rprojroot | 1.3-2 |
| rsconnect | 0.8.16 |
| rstan | 2.19.3 |
| rstanarm | 2.19.3 |
| rstantools | 2.0.0 |
| rstudioapi | 0.11 |
| sandwich | 2.5-1 |
| scales | 1.1.0 |
| shiny | 1.4.0 |
| shinyjs | 1.1 |
| shinystan | 2.5.0 |
| shinythemes | 1.1.2 |
| sourcetools | 0.1.7 |
| StanHeaders | 2.21.0-1 |
| stringi | 1.4.6 |
| stringr | 1.4.0 |
| sys | 3.3 |
| threejs | 0.3.3 |
| tibble | 2.1.3 |
| tidyselect | 1.0.0 |
| tikzDevice | 0.12.3 |
| utf8 | 1.1.4 |
| uuid | 0.1-4 |
| vctrs | 0.2.3 |
| viridisLite | 0.3.0 |
| withr | 2.1.2 |
| xfun | 0.12 |
| xtable | 1.8-4 |
| xts | 0.12-0 |
| yaml | 2.2.1 |
| zeallot | 0.1.0 |
| zoo | 1.8-7 |
Our approach to compare causal and confounded models via minimum description length (MDL) is based on the work of Kaltenpoth and Vreeken. We are not your real parents: Telling causal from confounded by MDL. In: ICDM. 2019.