Update PyMC tutorial notebooks with improvements and fixes

atheendre130505 · atheendre130505 · commit 9d8c92a2fdd0 · 2025-11-02T12:53:43.000+05:30
diff --git a/docs/source/learn/core_notebooks/dimensionality.ipynb b/docs/source/learn/core_notebooks/dimensionality.ipynb
@@ -22,7 +22,7 @@
     "+ *Explicit dimensions* → Dimensions that are explicitly defined by one of the following arguments:\n",
     "    + *Shape* → Number of draws from a distribution\n",
     "    + *Dims* → An array of dimension names\n",
-    "+ *Coords* → A dictionary mapping dimension names to coordinate values"
+    "+ *[Coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values)* → A dictionary mapping dimension names to coordinate values"
    ]
   },
   {
@@ -1513,7 +1513,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "PyMC supports the concept of `dims`. With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The `dims` argument is an additional human-readable label that can convey this meaning. When used alone, `dims` must be combined with explicit `shape` information."
+    "PyMC supports the concept of [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html). With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The `dims` argument is an additional human-readable label that can convey this meaning. When used alone, `dims` must be combined with explicit `shape` information."
    ]
   },
   {
@@ -1578,7 +1578,7 @@
     }
    },
    "source": [
-    "Where `dims` can become increasingly powerful is with the use of `coords` specified in the model itself. This gives a unique label to each `dim` entry, rendering it much more meaningful."
+    "Where [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) can become increasingly powerful is with the use of [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) specified in the model itself. This gives a unique label to each `dim` entry, rendering it much more meaningful."
    ]
   },
   {
@@ -1745,7 +1745,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    ":::{tip} For final model publication we suggest dims and coords as the labels will be passed to {class}`arviz.InferenceData`. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indicate which dimension corresponds to which model concept.\n",
+    ":::{tip} For final model publication we suggest [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) and [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) as the labels will be passed to [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html). This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indicate which dimension corresponds to which model concept.\n",
     ":::"
    ]
   },
@@ -1772,7 +1772,7 @@
     "\n",
     "* Using `model_to_graphviz` to visualize your model before sampling\n",
     "* Using `draw` or `sample_prior predictive` to catch errors early\n",
-    "* Inspecting the returned `az.InferenceData` object to ensure all array sizes are as intended\n",
+    "* Inspecting the returned [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object to ensure all array sizes are as intended\n",
     "* Defining shapes with prime numbers when tracking down errors."
    ]
   },
diff --git a/docs/source/learn/core_notebooks/dims_module.ipynb b/docs/source/learn/core_notebooks/dims_module.ipynb
@@ -17,11 +17,11 @@
    "source": [
     "## A short history of dims in PyMC\n",
     "\n",
-    "PyMC introduced the ability to specify model variable `dims` in version 3.9 in June 2020 (5 years as of the time of writing). In the release notes, it was mentioned only after [14 other new features](https://github.com/pymc-devs/pymc/blob/1d00f3eb81723523968f3610e81a0c42fd96326f/RELEASE-NOTES.md?plain=1#L236), but over time it became a foundation of the library.\n",
+    "PyMC introduced the ability to specify model variable [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) in version 3.9 in June 2020 (5 years as of the time of writing). In the release notes, it was mentioned only after [14 other new features](https://github.com/pymc-devs/pymc/blob/1d00f3eb81723523968f3610e81a0c42fd96326f/RELEASE-NOTES.md?plain=1#L236), but over time it became a foundation of the library.\n",
     "\n",
-    "It allows users to more naturally specify the dimensions of model variables with string names, and provides a \"seamless\" conversion to arviz {doc}`InferenceData <arviz:xarray_for_arviz>` objects, which have become the standard for storing and investigating results from probabilistic programming languages.\n",
+    "It allows users to more naturally specify the dimensions of model variables with string names, and provides a \"seamless\" conversion to arviz [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) objects, which have become the standard for storing and investigating results from probabilistic programming languages.\n",
     "\n",
-    "However, the behavior of dims is rather limited. It can only be used to specify the shape of new random variables and label existing dimensions (e.g., in {func}`~pymc.Deterministic`). Otherwise it has no effect on the computation, unlike operations done with {class}`~arviz.InferenceData` variables, which are based on {lib}`xarray` and where dims inform array selection, alignment, and broadcasting behavior.\n",
+    "However, the behavior of [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) is rather limited. It can only be used to specify the shape of new random variables and label existing dimensions (e.g., in {func}`~pymc.Deterministic`). Otherwise it has no effect on the computation, unlike operations done with {class}`~arviz.InferenceData` variables, which are based on [XArray](https://docs.xarray.dev/) and where dims inform array selection, alignment, and broadcasting behavior.\n",
     "\n",
     "As a result, in PyMC models users have to write computations that follow NumPy semantics, which often requires transpositions, reshapes, new axis (`None`) and numerical axis arguments sprinkled everywhere. It can be hard to get these right and in the end it's often hard to make sense of the written model.\n",
     "\n",
@@ -735,7 +735,7 @@
    "id": "90d65f41-2279-4205-bc2e-c5446a176c61",
    "metadata": {},
    "source": [
-    ":::{tip} Note that there are no coordinates anywhere in the graph. {class}`~pytensor.xtensor.type.XTensorVariable`s behave like xarray DataArrays **without** coords. Dims determine the dimension meaning and alignment, but no extra work can be done to reason within a dim. We discuss this limitation in more detail at the end.:::"
+    ":::{tip} Note that there are no coordinates anywhere in the graph. {class}`~pytensor.xtensor.type.XTensorVariable`s behave like [xarray DataArrays](https://docs.xarray.dev/) **without** coords. Dims determine the dimension meaning and alignment, but no extra work can be done to reason within a dim. We discuss this limitation in more detail at the end.:::"
    ]
   },
   {
@@ -2633,16 +2633,16 @@
    "id": "626c004fa7abdccf",
    "metadata": {},
    "source": [
-    "When you provide coords to a PyMC model, they are attached to any functions that returns Xarray or InferenceData objects.\n",
+    "When you provide [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) to a PyMC model, they are attached to any functions that returns [XArray](https://docs.xarray.dev/) or [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) objects.\n",
     "\n",
     "This creates a potential problem.\n",
     "\n",
     "Suppose we have multiple arrays with the same dims but different shapes.\n",
-    "This is legal in PyMC, as in Xarray, and some operations, like indexing or concatenating, can handle it.\n",
+    "This is legal in PyMC, as in XArray, and some operations, like indexing or concatenating, can handle it.\n",
     "\n",
     "However, after sampling, PyMC tries to reattach the coordinates to any computed variables, and these might not have the right shape, or they might not be correctly aligned.\n",
     "\n",
-    "When PyMC tries to convert the results of sampling to InferenceData, it will issue a warning and refuse to propagate the original coordinates.\n",
+    "When PyMC tries to convert the results of sampling to [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html), it will issue a warning and refuse to propagate the original coordinates.\n",
     "\n",
     "Here's an example where we have two variables with the `a` dim but different shapes, and only one matches the shape of the coordinates specified in the model."
    ]
@@ -2842,7 +2842,7 @@
    "id": "1fd902d9-33fd-40d6-a66d-fc15f1064b5c",
    "metadata": {},
    "source": [
-    "In Xarray the results would be correct because it is aware of the coordinates, not just the shape."
+    "In [XArray](https://docs.xarray.dev/) the results would be correct because it is aware of the coordinates, not just the shape."
    ]
   },
   {
@@ -2881,7 +2881,7 @@
     "\n",
     "We remind users that {func}`pymc.dims.Deterministic` variables are never required in a model; they are just a way to calculate and store the results of intermediate operations. If you use them, pay extra attention as to whether the model coordinates are appropriate for the variable stored in the  {func}`pymc.dims.Deterministic` (and not just their length but ordering as well).\n",
     "\n",
-    "Alternatively, you can use the regular {func}`pymc.Deterministic` without specifying `dims`, which will not propagate the coordinates to the model. Keep in mind that the respective dimensions will be considered unique after sampling, and operations between variables that had shared dims in the original model will broadcast orthogonally in the returned InferenceData variables.\n",
+    "Alternatively, you can use the regular {func}`pymc.Deterministic` without specifying `dims`, which will not propagate the coordinates to the model. Keep in mind that the respective dimensions will be considered unique after sampling, and operations between variables that had shared dims in the original model will broadcast orthogonally in the returned [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) variables.\n",
     "\n",
     "If you need variables to have the same dims but different coords, you can always fix them manually."
    ]
diff --git a/docs/source/learn/core_notebooks/pymc_overview.ipynb b/docs/source/learn/core_notebooks/pymc_overview.ipynb
@@ -385,7 +385,7 @@
     "id": "yXxn2AL2a71S"
    },
    "source": [
-    "The {mod}`~pymc.sample` function runs the step method(s) assigned (or passed) to it for the given number of iterations and returns an {class}`~arviz.InferenceData` object containing the samples collected, along with other useful attributes like statistics of the sampling run and a copy of the observed data. Notice that `sample` generated a set of parallel chains, depending on how many compute cores are on your machine."
+    "The {mod}`~pymc.sample` function runs the step method(s) assigned (or passed) to it for the given number of iterations and returns an [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object containing the samples collected, along with other useful attributes like statistics of the sampling run and a copy of the observed data. Notice that `sample` generated a set of parallel chains, depending on how many compute cores are on your machine."
    ]
   },
   {
@@ -2124,7 +2124,7 @@
     "id": "wU4sK5x9a71S"
    },
    "source": [
-    "The various attributes of the `InferenceData` object can be queried in a similar way to a `dict` containing a map from variable names to `numpy.array`s. For example, we can retrieve the sampling trace from the `alpha` latent variable by using the variable name as an index to the `idata.posterior` attribute. The first dimension of the returned array is the chain index, the second dimension is the sampling index, while the later dimensions match the shape of the variable. We can see the first 5 values for the `alpha` variable in each chain as follows:"
+    "The various attributes of the [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object can be queried in a similar way to a `dict` containing a map from variable names to `numpy.array`s. For example, we can retrieve the sampling trace from the `alpha` latent variable by using the variable name as an index to the `idata.posterior` attribute. The first dimension of the returned array is the chain index, the second dimension is the sampling index, while the later dimensions match the shape of the variable. We can see the first 5 values for the `alpha` variable in each chain as follows:"
    ]
   },
   {
@@ -3098,7 +3098,7 @@
     "\n",
     "In PyMC, variables with purely positive priors like {class}`~pymc.InverseGamma` are transformed with a log transform. This makes sampling more robust. Behind the scenes, a variable in the unconstrained space (named `<variable-name>_log`) is added to the model for sampling. Variables with priors that constrain them on two sides, like {class}`~pymc.Beta` or {class}`~pymc.Uniform`, are also transformed to be unconstrained but with a log odds transform.\n",
     "\n",
-    "We are also going to take advantage of named dimensions in PyMC and ArviZ by passing the input variable names into the model as coordinates called \"predictors\". This will allow us to pass this vector of names as a replacement for the `shape` integer argument in the vector-valued parameters. The model will then associate the appropriate name with each latent parameter that it is estimating. This is a little more work to set up, but will pay dividends later when we are working with our model output.\n",
+    "We are also going to take advantage of named dimensions in PyMC and ArviZ by passing the input variable names into the model as [coordinates](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) called \"predictors\". This will allow us to pass this vector of names as a replacement for the `shape` integer argument in the vector-valued parameters. The model will then associate the appropriate name with each latent parameter that it is estimating. This is a little more work to set up, but will pay dividends later when we are working with our model output.\n",
     "\n",
     "Let's encode this model in PyMC:"
    ]

Original file line number	Diff line number	Diff line change
`@@ -22,7 +22,7 @@`
`22`	`22`	`"+ Explicit dimensions → Dimensions that are explicitly defined by one of the following arguments:\n",`
`23`	`23`	`" + Shape → Number of draws from a distribution\n",`
`24`	`24`	`" + Dims → An array of dimension names\n",`
`25`		`- "+ Coords → A dictionary mapping dimension names to coordinate values"`
	`25`	`+ "+ [Coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) → A dictionary mapping dimension names to coordinate values"`
`26`	`26`	`]`
`27`	`27`	`},`
`28`	`28`	`{`
`@@ -1513,7 +1513,7 @@`
`1513`	`1513`	`"cell_type": "markdown",`
`1514`	`1514`	`"metadata": {},`
`1515`	`1515`	`"source": [`
`1516`		- "PyMC supports the concept of `dims`. With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The `dims` argument is an additional human-readable label that can convey this meaning. When used alone, `dims` must be combined with explicit `shape` information."
	`1516`	+ "PyMC supports the concept of [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html). With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The `dims` argument is an additional human-readable label that can convey this meaning. When used alone, `dims` must be combined with explicit `shape` information."
`1517`	`1517`	`]`
`1518`	`1518`	`},`
`1519`	`1519`	`{`
`@@ -1578,7 +1578,7 @@`
`1578`	`1578`	`}`
`1579`	`1579`	`},`
`1580`	`1580`	`"source": [`
`1581`		- "Where `dims` can become increasingly powerful is with the use of `coords` specified in the model itself. This gives a unique label to each `dim` entry, rendering it much more meaningful."
	`1581`	+ "Where [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) can become increasingly powerful is with the use of [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) specified in the model itself. This gives a unique label to each `dim` entry, rendering it much more meaningful."
`1582`	`1582`	`]`
`1583`	`1583`	`},`
`1584`	`1584`	`{`
`@@ -1745,7 +1745,7 @@`
`1745`	`1745`	`"cell_type": "markdown",`
`1746`	`1746`	`"metadata": {},`
`1747`	`1747`	`"source": [`
`1748`		- ":::{tip} For final model publication we suggest dims and coords as the labels will be passed to {class}`arviz.InferenceData`. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indicate which dimension corresponds to which model concept.\n",
	`1748`	+ ":::{tip} For final model publication we suggest [dims](https://docs.pymc.io/en/stable/learn/core_notebooks/dims_module.html) and [coords](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) as the labels will be passed to [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html). This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indicate which dimension corresponds to which model concept.\n",
`1749`	`1749`	`":::"`
`1750`	`1750`	`]`
`1751`	`1751`	`},`
`@@ -1772,7 +1772,7 @@`
`1772`	`1772`	`"\n",`
`1773`	`1773`	"* Using `model_to_graphviz` to visualize your model before sampling\n",
`1774`	`1774`	"* Using `draw` or `sample_prior predictive` to catch errors early\n",
`1775`		- "* Inspecting the returned `az.InferenceData` object to ensure all array sizes are as intended\n",
	`1775`	`+ "* Inspecting the returned [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object to ensure all array sizes are as intended\n",`
`1776`	`1776`	`"* Defining shapes with prime numbers when tracking down errors."`
`1777`	`1777`	`]`
`1778`	`1778`	`},`
Original file line number	Diff line number	Diff line change
`@@ -385,7 +385,7 @@`
`385`	`385`	`"id": "yXxn2AL2a71S"`
`386`	`386`	`},`
`387`	`387`	`"source": [`
`388`		- "The {mod}`~pymc.sample` function runs the step method(s) assigned (or passed) to it for the given number of iterations and returns an {class}`~arviz.InferenceData` object containing the samples collected, along with other useful attributes like statistics of the sampling run and a copy of the observed data. Notice that `sample` generated a set of parallel chains, depending on how many compute cores are on your machine."
	`388`	+ "The {mod}`~pymc.sample` function runs the step method(s) assigned (or passed) to it for the given number of iterations and returns an [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object containing the samples collected, along with other useful attributes like statistics of the sampling run and a copy of the observed data. Notice that `sample` generated a set of parallel chains, depending on how many compute cores are on your machine."
`389`	`389`	`]`
`390`	`390`	`},`
`391`	`391`	`{`
`@@ -2124,7 +2124,7 @@`
`2124`	`2124`	`"id": "wU4sK5x9a71S"`
`2125`	`2125`	`},`
`2126`	`2126`	`"source": [`
`2127`		- "The various attributes of the `InferenceData` object can be queried in a similar way to a `dict` containing a map from variable names to `numpy.array`s. For example, we can retrieve the sampling trace from the `alpha` latent variable by using the variable name as an index to the `idata.posterior` attribute. The first dimension of the returned array is the chain index, the second dimension is the sampling index, while the later dimensions match the shape of the variable. We can see the first 5 values for the `alpha` variable in each chain as follows:"
	`2127`	+ "The various attributes of the [InferenceData](https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.html) object can be queried in a similar way to a `dict` containing a map from variable names to `numpy.array`s. For example, we can retrieve the sampling trace from the `alpha` latent variable by using the variable name as an index to the `idata.posterior` attribute. The first dimension of the returned array is the chain index, the second dimension is the sampling index, while the later dimensions match the shape of the variable. We can see the first 5 values for the `alpha` variable in each chain as follows:"
`2128`	`2128`	`]`
`2129`	`2129`	`},`
`2130`	`2130`	`{`
`@@ -3098,7 +3098,7 @@`
`3098`	`3098`	`"\n",`
`3099`	`3099`	"In PyMC, variables with purely positive priors like {class}`~pymc.InverseGamma` are transformed with a log transform. This makes sampling more robust. Behind the scenes, a variable in the unconstrained space (named `<variable-name>_log`) is added to the model for sampling. Variables with priors that constrain them on two sides, like {class}`~pymc.Beta` or {class}`~pymc.Uniform`, are also transformed to be unconstrained but with a log odds transform.\n",
`3100`	`3100`	`"\n",`
`3101`		- "We are also going to take advantage of named dimensions in PyMC and ArviZ by passing the input variable names into the model as coordinates called \"predictors\". This will allow us to pass this vector of names as a replacement for the `shape` integer argument in the vector-valued parameters. The model will then associate the appropriate name with each latent parameter that it is estimating. This is a little more work to set up, but will pay dividends later when we are working with our model output.\n",
	`3101`	+ "We are also going to take advantage of named dimensions in PyMC and ArviZ by passing the input variable names into the model as [coordinates](https://docs.pymc.io/en/stable/api/model.html#coordinate-values) called \"predictors\". This will allow us to pass this vector of names as a replacement for the `shape` integer argument in the vector-valued parameters. The model will then associate the appropriate name with each latent parameter that it is estimating. This is a little more work to set up, but will pay dividends later when we are working with our model output.\n",
`3102`	`3102`	`"\n",`
`3103`	`3103`	`"Let's encode this model in PyMC:"`
`3104`	`3104`	`]`