Skip to content

Commit 58e4e48

Browse files
authored
Improved kernel description and normalizing flows (#228)
1 parent 34c7dc0 commit 58e4e48

File tree

3 files changed

+39
-6
lines changed

3 files changed

+39
-6
lines changed

dl/flows.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,9 @@
4444
"\n",
4545
"A bijector is a function that is [injective](https://en.wikipedia.org/wiki/Injective_function) (1 to 1) and [surjective](https://en.wikipedia.org/wiki/Surjective_function) (onto). An equivalent way to view a bijective function is if it has an inverse. For example, a sum reduction has no inverse and is thus not bijective. $\\sum [1,0] = 1$ and $\\sum [-1, 2] = 1$. Multiplying by a matrix which has an inverse is bijective. $y = x^2$ is not bijective, since $y = 4$ has two solutions. \n",
4646
"\n",
47-
"Remember that we must compute the determinant of the bijector Jacobian. If the Jacobian is dense (all output elements depend on all input elements), computing this quantity will be $O\\left(|x|_0^3\\right)$ where $|x|_0$ is the number of dimensions of $x$ because a determinant scales by $O(n^3)$. This would make computing normalizing flows impractical in high-dimensions. However, in practice we restrict ourselves to bijectors that have easy to calculate Jacobians. For example, if the bijector is $x_i = \\cos z_i$ then the Jacobian will be diagonal. Typically, the trick that is done is to make the Jacobian triangular. Then $x_0$ only depends on $z_0$, $z_1$ depends on $z_0, Z_1$, and $x_2$ depends on $z_0, z_1, z_2$, etc. The matrix determinant is then computed in linear time with respect to the number of dimensions.\n",
47+
"Remember that we must compute the determinant of the bijector Jacobian. If the Jacobian is dense (all output elements depend on all input elements), computing this quantity will be $O\\left(|x|_0^3\\right)$ where $|x|_0$ is the number of dimensions of $x$ because a determinant scales by $O(n^3)$. This would make computing normalizing flows impractical in high-dimensions. However, in practice we restrict ourselves to bijectors that have easy to calculate Jacobians. For example, if the bijector is $x_i = \\cos z_i$ then the Jacobian will be diagonal. Such a diagonal Jacobian means that each dimension is independent of the other though. \n",
48+
"\n",
49+
"One way to get faster determinants without just making each dimension independent is to get a triangular Jacobian. Then $x_0$ only depends on $z_0$, $x_1$ depends on $z_0, z_1$, and $x_2$ depends on $z_0, z_1, z_2$, etc. This enables fitting high-dimensional correlations for some of the dimensions (like $x_{n}$). The matrix determinant of a triangular matrix is computed in linear time with respect to the number of dimensions -- because it is just the product of the matrix diagonal. \n",
4850
"\n",
4951
"### Bijector Chains\n",
5052
"\n",
@@ -518,7 +520,7 @@
518520
"name": "python",
519521
"nbconvert_exporter": "python",
520522
"pygments_lexer": "ipython3",
521-
"version": "3.8.5"
523+
"version": "3.8.12"
522524
}
523525
},
524526
"nbformat": 4,

ml/kernel.ipynb

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,9 +117,18 @@
117117
"feature_names = soldata.columns[features_start_at:]\n",
118118
"np.random.seed(0)\n",
119119
"\n",
120-
"# standardize the features\n",
121-
"soldata[feature_names] -= soldata[feature_names].mean()\n",
122-
"soldata[feature_names] /= soldata[feature_names].std()"
120+
"# Split into train(80) and test(20)\n",
121+
"N = len(soldata)\n",
122+
"split = int(N * 0.8)\n",
123+
"shuffled = soldata.sample(N, replace=False)\n",
124+
"train = shuffled[:N]\n",
125+
"test = shuffled[N:]\n",
126+
"\n",
127+
"# standardize the features using only train\n",
128+
"test[feature_names] -= train[feature_names].mean()\n",
129+
"test[feature_names] /= train[feature_names].std()\n",
130+
"train[feature_names] -= train[feature_names].mean()\n",
131+
"train[feature_names] /= train[feature_names].std()"
123132
]
124133
},
125134
{
@@ -128,7 +137,9 @@
128137
"source": [
129138
"## Kernel Definition\n",
130139
"\n",
131-
"We'll start by creating our kernel function. *Our kernel function does not need to be differentiable*. In contrast to the functions we see in deep learning, we can use sophisticated and non-differentiable functions in kernel learning. For example, you could use a two-component molecular dynamics simulation to compute the kernel between two molecules. We'll still implement our kernel functions in JAX for this example because it is efficient and consistent. Remember our kernel should take two feature vectors and return a scalar. In our example, we will simply use the $l_2$ norm. I will add one small twist though: dividing by the dimension. This makes our kernel output magnitude is independent of the number of dimensions of $x$. "
140+
"We'll start by creating our kernel function. *Our kernel function does not need to be differentiable*. In contrast to the functions we see in deep learning, we can use sophisticated and non-differentiable functions in kernel learning. For example, you could use a two-component molecular dynamics simulation to compute the kernel between two molecules. We'll still implement our kernel functions in JAX for this example because it is efficient and consistent. Remember our kernel should take two feature vectors and return a scalar. In our example, we will simply use the $l_2$ norm. There is one small change: dividing by the dimension. This makes our kernel output magnitude is independent of the number of dimensions of $x$. Other options for kernels for dense vectors are $1 - $ cosine similarity (dot product) or [Mahalanobis distance](https://en.wikipedia.org/wiki/Mahalanobis_distance).\n",
141+
"\n",
142+
"Choosing a kernel function is an open question for molecules. You can work with the molecular structure directly with a variety of ideas. You can use {doc}`../dl/gnn` to convert molecules into vectors and use any of the vector kernels above. You can work with fingerprints (vectors of bits) from cheminformatics libraries like Rdkit and compare such vectors with [Tanimoto similarity](https://en.wikipedia.org/wiki/Jaccard_index) {doc}`rankovic_griffiths_moss_schwaller_2022, yang2022classifying`. "
132143
]
133144
},
134145
{

references.bib

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1796,3 +1796,23 @@ @article{blumenthal2022ai
17961796
year = {2022},
17971797
publisher = {Taylor \& Francis}
17981798
}
1799+
1800+
1801+
@article{rankovic_griffiths_moss_schwaller_2022,
1802+
place = {Cambridge},
1803+
title = {Bayesian optimisation for additive screening and yield
1804+
improvements in chemical reactions -- beyond one-hot encodings},
1805+
doi = {10.26434/chemrxiv-2022-nll2j},
1806+
journal = {ChemRxiv},
1807+
publisher = {Cambridge Open Engage},
1808+
author = {Ranković, Bojana and Griffiths, Ryan-Rhys and Moss, Henry B. and Schwaller, Philippe},
1809+
year = {2022}
1810+
}
1811+
1812+
@article{yang2022classifying,
1813+
title = {Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels},
1814+
author = {Yang, Ping and Henle, E Adrian and Fern, Xiaoli Z and Simon, Cory M},
1815+
journal = {The Journal of Chemical Physics},
1816+
year = {2022},
1817+
publisher = {AIP Publishing LLC}
1818+
}

0 commit comments

Comments
 (0)