|
17 | 17 | "source": [ |
18 | 18 | "## A short history of dims in PyMC\n", |
19 | 19 | "\n", |
20 | | - "PyMC introduced the ability to specify model variable `dims` in version 3.9 in June 2020 (5 years as of the time of writing). In the release notes, it was mentioned only after [14 other new features](https://github.com/pymc-devs/pymc/blob/1d00f3eb81723523968f3610e81a0c42fd96326f/RELEASE-NOTES.md?plain=1#L236), but over time it became a foundation of the library.\n", |
| 20 | + "PyMC introduced the ability to specify model variable [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) in version 3.9 in June 2020 (5 years as of the time of writing). In the release notes, it was mentioned only after [14 other new features](https://github.com/pymc-devs/pymc/blob/1d00f3eb81723523968f3610e81a0c42fd96326f/RELEASE-NOTES.md?plain=1#L236), but over time it became a foundation of the library.\n", |
21 | 21 | "\n", |
22 | | - "It allows users to more naturally specify the dimensions of model variables with string names, and provides a \"seamless\" conversion to arviz {doc}`InferenceData <arviz:xarray_for_arviz>` objects, which have become the standard for storing and investigating results from probabilistic programming languages.\n", |
| 22 | + "It allows users to more naturally specify the dimensions of model variables with string names, and provides a \"seamless\" conversion to arviz [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) objects, which have become the standard for storing and investigating results from probabilistic programming languages.\n", |
23 | 23 | "\n", |
24 | | - "However, the behavior of dims is rather limited. It can only be used to specify the shape of new random variables and label existing dimensions (e.g., in {func}`~pymc.Deterministic`). Otherwise it has no effect on the computation, unlike operations done with {class}`~arviz.InferenceData` variables, which are based on {lib}`xarray` and where dims inform array selection, alignment, and broadcasting behavior.\n", |
| 24 | + "However, the behavior of [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) is rather limited. It can only be used to specify the shape of new random variables and label existing dimensions (e.g., in {func}`~pymc.Deterministic`). Otherwise it has no effect on the computation, unlike operations done with {class}`~arviz.InferenceData` variables, which are based on [XArray](https://docs.xarray.dev/) and where dims inform array selection, alignment, and broadcasting behavior.\n", |
25 | 25 | "\n", |
26 | 26 | "As a result, in PyMC models users have to write computations that follow NumPy semantics, which often requires transpositions, reshapes, new axis (`None`) and numerical axis arguments sprinkled everywhere. It can be hard to get these right and in the end it's often hard to make sense of the written model.\n", |
27 | 27 | "\n", |
|
735 | 735 | "id": "90d65f41-2279-4205-bc2e-c5446a176c61", |
736 | 736 | "metadata": {}, |
737 | 737 | "source": [ |
738 | | - ":::{tip} Note that there are no coordinates anywhere in the graph. {class}`~pytensor.xtensor.type.XTensorVariable`s behave like xarray DataArrays **without** coords. Dims determine the dimension meaning and alignment, but no extra work can be done to reason within a dim. We discuss this limitation in more detail at the end.:::" |
| 738 | + ":::{tip} Note that there are no coordinates anywhere in the graph. {class}`~pytensor.xtensor.type.XTensorVariable`s behave like [xarray DataArrays](https://docs.xarray.dev/) **without** coords. Dims determine the dimension meaning and alignment, but no extra work can be done to reason within a dim. We discuss this limitation in more detail at the end.:::" |
739 | 739 | ] |
740 | 740 | }, |
741 | 741 | { |
|
2633 | 2633 | "id": "626c004fa7abdccf", |
2634 | 2634 | "metadata": {}, |
2635 | 2635 | "source": [ |
2636 | | - "When you provide coords to a PyMC model, they are attached to any functions that returns Xarray or InferenceData objects.\n", |
| 2636 | + "When you provide [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) to a PyMC model, they are attached to any functions that returns [XArray](https://docs.xarray.dev/) or [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) objects.\n", |
2637 | 2637 | "\n", |
2638 | 2638 | "This creates a potential problem.\n", |
2639 | 2639 | "\n", |
2640 | 2640 | "Suppose we have multiple arrays with the same dims but different shapes.\n", |
2641 | | - "This is legal in PyMC, as in Xarray, and some operations, like indexing or concatenating, can handle it.\n", |
| 2641 | + "This is legal in PyMC, as in XArray, and some operations, like indexing or concatenating, can handle it.\n", |
2642 | 2642 | "\n", |
2643 | 2643 | "However, after sampling, PyMC tries to reattach the coordinates to any computed variables, and these might not have the right shape, or they might not be correctly aligned.\n", |
2644 | 2644 | "\n", |
2645 | | - "When PyMC tries to convert the results of sampling to InferenceData, it will issue a warning and refuse to propagate the original coordinates.\n", |
| 2645 | + "When PyMC tries to convert the results of sampling to [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html), it will issue a warning and refuse to propagate the original coordinates.\n", |
2646 | 2646 | "\n", |
2647 | 2647 | "Here's an example where we have two variables with the `a` dim but different shapes, and only one matches the shape of the coordinates specified in the model." |
2648 | 2648 | ] |
|
2842 | 2842 | "id": "1fd902d9-33fd-40d6-a66d-fc15f1064b5c", |
2843 | 2843 | "metadata": {}, |
2844 | 2844 | "source": [ |
2845 | | - "In Xarray the results would be correct because it is aware of the coordinates, not just the shape." |
| 2845 | + "In [XArray](https://docs.xarray.dev/) the results would be correct because it is aware of the coordinates, not just the shape." |
2846 | 2846 | ] |
2847 | 2847 | }, |
2848 | 2848 | { |
|
2881 | 2881 | "\n", |
2882 | 2882 | "We remind users that {func}`pymc.dims.Deterministic` variables are never required in a model; they are just a way to calculate and store the results of intermediate operations. If you use them, pay extra attention as to whether the model coordinates are appropriate for the variable stored in the {func}`pymc.dims.Deterministic` (and not just their length but ordering as well).\n", |
2883 | 2883 | "\n", |
2884 | | - "Alternatively, you can use the regular {func}`pymc.Deterministic` without specifying `dims`, which will not propagate the coordinates to the model. Keep in mind that the respective dimensions will be considered unique after sampling, and operations between variables that had shared dims in the original model will broadcast orthogonally in the returned InferenceData variables.\n", |
| 2884 | + "Alternatively, you can use the regular {func}`pymc.Deterministic` without specifying `dims`, which will not propagate the coordinates to the model. Keep in mind that the respective dimensions will be considered unique after sampling, and operations between variables that had shared dims in the original model will broadcast orthogonally in the returned [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) variables.\n", |
2885 | 2885 | "\n", |
2886 | 2886 | "If you need variables to have the same dims but different coords, you can always fix them manually." |
2887 | 2887 | ] |
|
0 commit comments