|
1053 | 1053 | "A common piece of wisdom is if you want to solve a real problem with deep learning, you should read the most recent popular paper in an area and use the baseline they compare against instead of their proposed model. The reason is that a baseline model usually must be easy, fast, and well-tested, which is generally more important than being the most accurate\n",
|
1054 | 1054 | "```\n",
|
1055 | 1055 | "\n",
|
1056 |
| - "SchNet is for atoms represented as xyz coordinates (points) -- not as a molecular graph. All our previous examples used the underlying molecular graph as the input. In SchNet we will convert our xyz coodinates into a graph, so that we can apply a GNNN. SchNet was developed for predicting energies and forces from atom configurations without bond information. Thus, we need to first see how a set of atoms and their positions is converted into a graph. To get the nodes, we do a similar process as above and the atomic number is passed through an embedding layer, which is just means we assign a trainable vector to each atomic number (See {doc}`layers` for a review of embeddings). \n", |
| 1056 | + "SchNet is for atoms represented as xyz coordinates (points) -- not as a molecular graph. All our previous examples used the underlying molecular graph as the input. In SchNet we will convert our xyz coodinates into a graph, so that we can apply a GNN. SchNet was developed for predicting energies and forces from atom configurations without bond information. Thus, we need to first see how a set of atoms and their positions is converted into a graph. To get the nodes, we do a similar process as above and the atomic number is passed through an embedding layer, which just means we assign a trainable vector to each atomic number (See {doc}`layers` for a review of embeddings). \n", |
1057 | 1057 | "\n",
|
1058 | 1058 | "Getting the adjacency matrix is simple too: we just make every atom be connected to every atom. It might seem confusing what the point of using a GNN is, if we're just connecting everything. *It is because GNNs are permutation equivariant.* If we tried to do learning on the atoms as xyz coordinates, we would have weights depending on the ordering of atoms and probably fail to handle different numbers of atoms.\n",
|
1059 | 1059 | "\n",
|
|
1220 | 1220 | "\n",
|
1221 | 1221 | "label_str = list(set([k.split(\"-\")[0] for k in trajs]))\n",
|
1222 | 1222 | "\n",
|
| 1223 | + "\n", |
1223 | 1224 | "# now build dataset\n",
|
1224 | 1225 | "def generator():\n",
|
1225 | 1226 | " for k, v in trajs.items():\n",
|
|
1553 | 1554 | "\n",
|
1554 | 1555 | "---\n",
|
1555 | 1556 | "\n",
|
1556 |
| - "Let's give now use the model on some data." |
| 1557 | + "Let's now use the model on some data." |
1557 | 1558 | ]
|
1558 | 1559 | },
|
1559 | 1560 | {
|
|
1680 | 1681 | "\n",
|
1681 | 1682 | "### Common Architecture Motifs and Comparisons\n",
|
1682 | 1683 | "\n",
|
1683 |
| - "We've now seen message passing layer GNNs, GCNs, GGNs, and the generalized Battaglia equations. You'll find common motifs in the architectures, like gating, {doc}`attention`, and pooling strategies. For example, Gated GNNS (GGNs) can be combined with attention pooling to create Gated Attention GNNs (GAANs){cite}`zhang2018gaan`. GraphSAGE is a similar to a GCN but it samples when pooling, making the neighbor-updates of fixed dimension{cite}`hamilton2017inductive`. So you'll see the suffix \"sage\" when you sample over neighbors while pooling. These can all be represented in the Battaglia equations, but you should be aware of these names. \n", |
| 1684 | + "We've now seen message passing layer GNNs, GCNs, GGNs, and the generalized Battaglia equations. You'll find common motifs in the architectures, like gating, {doc}`attention`, and pooling strategies. For example, Gated GNNS (GGNs) can be combined with attention pooling to create Gated Attention GNNs (GAANs){cite}`zhang2018gaan`. GraphSAGE is similar to a GCN but it samples when pooling, making the neighbor-updates of fixed dimension{cite}`hamilton2017inductive`. So you'll see the suffix \"sage\" when you sample over neighbors while pooling. These can all be represented in the Battaglia equations, but you should be aware of these names. \n", |
1684 | 1685 | "\n",
|
1685 | 1686 | "The enormous variety of architectures has led to work on identifying the \"best\" or most general GNN architecture {cite}`dwivedi2020benchmarking,errica2019fair,shchur2018pitfalls`. Unfortunately, the question of which GNN architecture is best is as difficult as \"what benchmark problems are best?\" Thus there are no agreed-upon conclusions on the best architecture. However, those papers are great resources on training, hyperparameters, and reasonable starting guesses and I highly recommend reading them before designing your own GNN. There has been some theoretical work to show that simple architectures, like GCNs, cannot distinguish between certain simple graphs {cite}`xu2018powerful`. How much this practically matters depends on your data. Ultimately, there is so much variety in hyperparameters, data equivariances, and training decisions that you should think carefully about how much the GNN architecture matters before exploring it with too much depth. "
|
1686 | 1687 | ]
|
|
0 commit comments