-
Notifications
You must be signed in to change notification settings - Fork 28
pint-xarray blog post #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
6d26cc3
import the current state of the blog post
keewis d9596c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 772c427
trigger build
keewis 7ca85ce
Update src/posts/introducing-pint-xarray/index.md
keewis d5d3017
Update src/posts/introducing-pint-xarray/index.md
keewis f35ff96
add the image
keewis 829ae25
whitespace
keewis c47a615
Update src/posts/introducing-pint-xarray/index.md
keewis e035caa
try using the ipython format
keewis d953fb9
Merge branch 'main' into introducing-pint-xarray
andersy005 684b465
Merge branch 'main' into introducing-pint-xarray
andersy005 5baef32
Merge branch 'main' into introducing-pint-xarray
andersy005 64cd10b
Merge branch 'main' into introducing-pint-xarray
andersy005 baf2948
Merge branch 'main' into introducing-pint-xarray
andersy005 ba9bcf2
Apply suggestions from code review
keewis 1e32d37
Apply suggestions from code review
keewis 844096b
Apply suggestions from code review
keewis 2aedbac
Merge branch 'main' into introducing-pint-xarray
andersy005 bfeb716
Chunk in open_dataset example
TomNicholas 3d02b83
Merge branch 'main' into introducing-pint-xarray
andersy005 a1cde77
Apply suggestions from code review
keewis 8d37219
Merge branch 'main' into introducing-pint-xarray
andersy005 3f358b9
use new author format
andersy005 0758dca
remove the remaining ipython formatting
keewis 8df4609
more formatting fixes
keewis 7c52770
remove the obsolete import order comment
keewis 3f0e319
also add pint to the list
keewis c048904
Clarify purpose of upcoming feature
TomNicholas 7b4cfeb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bf63bc8
Remove extra word
TomNicholas 7179b6c
also specify the version of pint-xarray
keewis 37cd971
Merge branch 'main' into introducing-pint-xarray
andersy005 99c11cc
Merge branch 'main' into introducing-pint-xarray
andersy005 3aa009c
Merge branch 'main' into introducing-pint-xarray
andersy005 f9bb62c
Merge branch 'main' into introducing-pint-xarray
andersy005 1b2b832
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 061aa8b
update date
andersy005 0283833
rewrite 4th pint feature as an example
TomNicholas 7144eb0
Merge branch 'introducing-pint-xarray' of https://github.com/xarray-c…
TomNicholas 94dbe36
Merge branch 'main' into introducing-pint-xarray
andersy005 69f6035
Add social share card
andersy005 587e16c
Merge branch 'main' into introducing-pint-xarray
andersy005 d8d6578
update social share card
andersy005 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,295 @@ | ||
--- | ||
title: 'Unit-aware arithmetic in Xarray, via pint' | ||
date: '2022-08-30' | ||
authors: | ||
- name: Tom Nicholas | ||
github: TomNicholas | ||
- name: Justus Magin | ||
github: keewis | ||
summary: 'Xarray now supports unit-aware operations by wrapping pint arrays' | ||
--- | ||
|
||
_TLDR: Pint-Xarray supports unit-aware operations by wrapping [pint arrays](https://pint.readthedocs.io/en/stable/), so your code can automatically track the physical units that your data represents:_ | ||
|
||
```python | ||
distance = xr.DataArray(10).pint.quantify("metres") | ||
time = xr.DataArray(4).pint.quantify("seconds") | ||
|
||
distance / time | ||
``` | ||
|
||
``` | ||
Out: | ||
<xarray.DataArray ()> | ||
<Quantity(2.5, 'meter / second')> | ||
``` | ||
|
||
## Units are integral to science | ||
|
||
All quantities in science have units, whether explicitly or implicitly. (And even dimensionless quantities like ratios still technically have units.) | ||
|
||
Getting our units right is finicky, and can very easily go unnoticed in our code. | ||
|
||
Even worse, the consequences of getting units wrong can be huge! | ||
|
||
The most famous example of a units error has to be NASA's $125 million [Mars Climate Orbiter](https://www.simscale.com/blog/2017/12/nasa-mars-climate-orbiter-metric/), which in 1999 burned up in the Martian atmosphere instead of successfully entering orbit around Mars. | ||
A trajectory course correction had gone wrong, and the error was eventually traced back to a units mismatch: the engineers at Lockheed Martin expressed impulse in [pound-force](<https://en.wikipedia.org/wiki/Pound_(force)>) seconds, whereas the engineers at JPL assumed the impulse value their part of the software received was in SI newton seconds. | ||
|
||
<p align='center'> | ||
<img src='https://clqtg10snjb14i85u49wifbv-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/Customers.jpg' /> | ||
</p> | ||
|
||
<p align='center'> | ||
Newspaper cartoon depicting the incongruence in the units used by NASA and | ||
Lockheed Martin scientists that led to the Mars Climate Orbiter disaster. | ||
</p> | ||
|
||
We should take stories like this seriously: If we can automatically track units we can potentially eliminate a whole class of possible errors in our scientific work... | ||
andersy005 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Pint tracks units | ||
|
||
There are a few packages for handling units in python (notably [unyt](https://github.com/yt-project/unyt) and [astropy.units](https://docs.astropy.org/en/stable/units/)), but for technical reasons we began units integration in Xarray with [pint](https://pint.readthedocs.io/en/stable/). | ||
These various packages work by providing a numerical array type that acts similarly to a NumPy array, and is intended to plug in and replace the raw NumPy array (a so-called "duck array type"). | ||
|
||
Pint provides the `Quantity` object, which is a normal numpy array combined with a `pint.Unit`: | ||
|
||
```python | ||
q = np.array([6, 7]) * pint.Unit('metres') | ||
print(repr(q)) | ||
``` | ||
|
||
``` | ||
Out: | ||
<Quantity([6 7], 'meter')> | ||
``` | ||
|
||
Pint Quantities act like NumPy arrays, except that the units are carried around with the arrays, propagated through operations, and checked during operations involving multiple quantities. | ||
|
||
## Xarray now wraps Pint | ||
|
||
Thanks to the [tireless work](https://github.com/pydata/xarray/issues/3594) of Xarray core developer [Justus Magin](https://github.com/keewis), you can now enjoy this automatic unit-handling in Xarray! | ||
|
||
Once you create a unit-aware Xarray object (see below for how) you can see the units of the data variables displayed as part of the printable representation. | ||
You also immediately get the key benefits of Pint: | ||
|
||
1. Units are propagated through arithmetic, and new quantities are built using the units of the inputs: | ||
|
||
```python | ||
distance = xr.DataArray(10).pint.quantify("metres") | ||
time = xr.DataArray(4).pint.quantify("seconds") | ||
|
||
distance / time | ||
``` | ||
|
||
``` | ||
Out: | ||
<xarray.DataArray ()> | ||
<Quantity(2.5, 'meter / second')> | ||
``` | ||
|
||
2. Dimensionally inconsistent units are caught automatically: | ||
|
||
```python | ||
apples = xr.DataArray(10).pint.quantify("kg") | ||
oranges = xr.DataArray(200).pint.quantify("cm^3") | ||
|
||
apples + oranges | ||
``` | ||
|
||
``` | ||
Out: | ||
DimensionalityError: Cannot convert from 'kilogram' ([mass]) to 'centimeter ** 3' ([length] ** 3) | ||
``` | ||
|
||
3. Unit conversions become simple: | ||
|
||
```python | ||
walk = xr.DataArray(500).pint.quantify('miles') | ||
|
||
walk.pint.to('parsecs') | ||
``` | ||
|
||
``` | ||
Out: | ||
<xarray.DataArray ()> | ||
<Quantity(2.6077643524162074e-11, 'parsec')> | ||
``` | ||
|
||
With these features, you can build code that automatically propagates units and converts them where necessary to stay consistent. | ||
For example, the problem of the NASA orbiter could have been prevented by explicitly converting to the correct units at the start | ||
|
||
```python | ||
def jpl_trajectory_code(impulse): | ||
|
||
# Defensively check units first | ||
impulse = impulse.pint.to("Newton * seconds") | ||
|
||
# This function we called here will only compute the correct result if supplied input in units of Newton-seconds, | ||
# but that's fine because we already converted the values to be in the correct units! | ||
propagated_position = some_rocket_science(impulse) | ||
|
||
return propagated_position | ||
``` | ||
|
||
Note: We are adding [new features](https://github.com/xarray-contrib/pint-xarray/pull/143) to make specifying the units of parameters of existing library functions more slick. | ||
|
||
In the abstract, tracking units like this is useful in the same way that labelling dimensions with Xarray is useful: it helps us avoid errors by relieving us of the burden of remembering arbitrary information about our data. | ||
|
||
## Quantifying with pint-xarray | ||
|
||
The easiest way to create a unit-aware Xarray object is to use the helper package we made: [pint-xarray](https://github.com/xarray-contrib/pint-xarray). | ||
Once you `import pint_xarray` you can access unit-related functionality via `.pint` on any `DataArray` or `Dataset` (this works via [Xarray's accessor interface](https://xarray.pydata.org/en/stable/internals/extending-xarray.html)). | ||
|
||
Above we have seen examples of quantifying explicitly, where we specify the units in the call to `.quantify()`. | ||
We can do this for multiple variables too, and we can also pass `pint.Unit` instances: | ||
|
||
```python | ||
ds = xr.Dataset({'a': 2, 'b': 10}) | ||
|
||
ds.pint.quantify({'a': 'kg', | ||
'b': pint.Unit('moles')}) | ||
``` | ||
|
||
``` | ||
Out: | ||
<xarray.Dataset> | ||
Dimensions: () | ||
Data variables: | ||
a int64 [kg] 2 | ||
b int64 [mol] 10 | ||
``` | ||
|
||
Alternatively, we can quantify from the object's `.attrs`, automatically reading the metadata which xarray objects carry around. | ||
If nothing is passed to `.quantify()`, it will attempt to parse the `.attrs['units']` entry for each data variable. | ||
|
||
This means that for scientific datasets which are stored as files with units in their attributes (which netCDF and Zarr can do for example), using Pint with Xarray becomes as simple as: | ||
|
||
```python | ||
import pint_xarray | ||
|
||
ds = open_dataset(filepath).pint.quantify() | ||
``` | ||
|
||
## Dequantifying | ||
|
||
To convert our pint arrays back into NumPy arrays, we can use `.dequantify`. | ||
This will strip the units from the arrays and replace them into the `.attrs['units']` of each variable. | ||
This is useful when we want to save our data back to a file, as it means that the current units will be preserved in the attributes of a netCDF file (or Zarr store etc.), as long as we just do `ds.pint.dequantify().to_netcdf(...)`. | ||
|
||
## Dask integration | ||
|
||
So Xarray can wrap Dask arrays, and now it can wrap Pint quantities… Can we use both together? Yes! | ||
|
||
You can get a unit-aware, Dask-backed array either by `.pint.quantify()`-ing a chunked array, or you can `.pint.chunk()` a quantified array. | ||
(If you have Dask installed, then `open_dataset(f, chunks={}).pint.quantify()` will already give you a Dask-backed, quantified array.) | ||
From there you can `.compute()` the Dask-backed objects as normal, and the units will be retained. | ||
|
||
(Under the hood we now have an `xarray.DataArray` wrapping a `pint.Quantity`, which wraps a `dask.array.Array`, which wraps a `numpy.ndarray`. | ||
This "multi-nested duck array" approach can be generalised to include other array libraries (e.g. `scipy.sparse`), but requires [coordination](https://github.com/pydata/duck-array-discussion) between the maintainers of the libraries involved.) | ||
|
||
## Unit-aware indexes | ||
|
||
We would love to be able to promote Xarray indexes to Pint Quantities, as that would allow you to select data subsets in a unit-aware manner like: | ||
|
||
```python | ||
da = xr.DataArray(name='a', data=[0, 1, 2], dims='x', coords={'x': [1000, 2000, 3000]}) | ||
da = da.pint.quantify({'a': 'Pa', 'x': 'm'}) | ||
|
||
da.pint.sel(x=2 * 'km') | ||
``` | ||
|
||
Unfortunately this will not possible until the ongoing work to extend Xarray to support [explicit indexes](https://github.com/pydata/xarray/issues/1603) is complete. | ||
|
||
In the meantime pint-xarray offers a workaround. If you tell `.quantify` the units you wish an index to have, it will store those in `.attrs["units"]` instead. | ||
|
||
```python | ||
time = xr.DataArray([0.1, 0.2, 0.3], dims='time') | ||
distance = xr.DataArray(name='distance', | ||
data=[10, 20, 25], | ||
dims=['time'], | ||
coords={'time': time}) | ||
distance = distance.pint.quantify({'distance': 'metres', | ||
'time': 'seconds'}) | ||
print(distance.coords['time'].attrs) | ||
``` | ||
|
||
``` | ||
Out: | ||
{'units': <Unit('second')>} | ||
``` | ||
|
||
This allows us to provide conveniently wrapped versions of common xarray methods like `.sel`, so that you can still select subsets of data in a unit-aware fashion like this: | ||
|
||
```python | ||
distance.pint.sel(time=200 * pint.Unit('milliseconds')) | ||
``` | ||
|
||
``` | ||
Out: | ||
<xarray.DataArray 'distance' ()> | ||
<Quantity(20, 'meter')> | ||
Coordinates: | ||
time float64 200.0 | ||
``` | ||
|
||
Observe how the `.pint.sel` operation has first converted 200 milliseconds to 0.2 seconds, before finding the distance value that occurs at a time position of 0.2 seconds. | ||
|
||
[This wrapping is currently necessary](https://xarray.pydata.org/en/stable/user-guide/duckarrays.html#missing-features) for any operation which needs to be aware of the units of a dimension coordinate of the dataarray, or any xarray operation which relies on an external library (such as calling `scipy` in `.integrate`). | ||
|
||
## CF-compliant units for geosciences with cf-xarray | ||
|
||
Different fields tend to have different niche conventions about how certain units are defined. | ||
By default, Pint doesn't understand all the unusual units and conventions we use in geosciences. | ||
But [Pint is customisable](https://pint.readthedocs.io/en/stable/defining.html), and with the help of [cf-xarray](https://github.com/xarray-contrib/cf-xarray) we can teach it about these geoscience-specific units. | ||
|
||
If we `import cf_xarray.units` (before `import pint_xarray`) then we can `quantify` example climate data from the [Pangeo Project's CMIP6 catalog](https://pangeo-data.github.io/pangeo-cmip6-cloud/): | ||
|
||
```python | ||
import xarray as xr | ||
import cf_xarray.units | ||
import pint_xarray | ||
|
||
ds = xr.open_dataset('gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/sfcWind/gn/v20200226/', engine='zarr') | ||
ds = ds.pint.quantify() | ||
|
||
squared_wind = ds['sfcWind'] ** 2 | ||
squared_wind.pint.units | ||
``` | ||
|
||
``` | ||
Out: | ||
<Unit('meter ** 2 / second ** 2')> | ||
``` | ||
|
||
Here (thanks to `cf_xarray`) pint has successfully interpreted the CF-style units `'m s-1'`, then automatically changed them when we squared the wind speed. | ||
|
||
## Plotting | ||
|
||
We can complete our real-world example by plotting the data in its new units: | ||
|
||
```python | ||
import cartopy.crs as ccrs | ||
import matplotlib.pyplot as plt | ||
|
||
p = squared_wind.isel(time="2014-01").plot( | ||
subplot_kws=dict(projection=ccrs.Orthographic(-80, 35), facecolor="gray"), | ||
transform=ccrs.PlateCarree(), | ||
) | ||
p.axes.set_global() | ||
p.axes.coastlines() | ||
plt.show() | ||
``` | ||
|
||
 | ||
|
||
where `xarray.plot` has detected the Pint units automatically. | ||
|
||
## Conclusion | ||
|
||
Please have a go! You will need xarray (v2022.03.0+), pint (0.18+), and pint-xarray (0.3+). | ||
|
||
Please also tell us about any bugs you find, or documentation suggestions you have on the [Xarray](https://github.com/pydata/xarray/issues) or [pint-xarray issue trackers](https://github.com/xarray-contrib/pint-xarray/issues). | ||
If you have usage questions you can raise them there, on the [Xarray discussions page](https://github.com/pydata/xarray/discussions), or on the [Pangeo Discourse forum](https://discourse.pangeo.io/). | ||
|
||
The work here to allow Xarray to wrap Pint objects is part of a [broader effort to generalise Xarray](http://xarray.pydata.org/en/stable/roadmap.html#flexible-arrays) to handle a wide variety of data types (so-called "duck array wrapping"). | ||
Along with the incoming [support for flexible indexes](http://xarray.pydata.org/en/stable/roadmap.html#flexible-indexes), we are excited for all the new features that this will enable for Xarray users! |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.