Skip to content

Defining Pairwise Interactions #3

Open
@jthielen

Description

@jthielen

As summarized in #1, the interactions between duck array libraries cannot be sufficiently described by a (linked-)list of priorities (as can arise from __array_priority__), but is instead best described as a directed graph. So that the dispatch between types can work out consistently and unambiguously, this graph needs to be acyclic, which thereby requires agreement/coordination between duck array libraries.

See also: dask/dask#6635

Current State

Presently, this coordination has been informally done through independent/ad-hoc implementations in each duck array library. Two main approaches have arisen:

  • Have an "allow list" of types that this array type can handle/wrap, and defer to any other (e.g., Dask)
  • Have a "deny list" of types to which this array type defers, and assume any other "sufficiently array-like" type can be handled/wrapped (e.g., Pint, but also xarray if the "deny list" is effectively empty)

For a limited set of commonly-used array types in the pydata stack, this has often worked out in practice so far. However, as the number of duck array libraries increases, maintaining agreement between libraries through the existing independent approaches becomes difficult.

As an example of what this type casting hierarchy looks like in practice, Pint has summarized the consensus DAG between several common array types (as of 2020) as follows:

Furthermore, these interactions often play out implicitly via protocols like __array_ufunc__ and __array_function__. In contrast, an explicit strategy like NEP 37 may be a preferable way to define these pairwise interactions.

Specific Goals

  • The directed graph of array type interactions is agreed upon across the community and remains acyclic
  • Introduction of new types to the accepted DAG is easy and can be safely done without introducing cyclicness.

Key Points Raised at Coordination Meeting

  1. While a xarray -> pint -> dask -> others casting order for the "top" of the type DAG has been used in practice to this point, several issues/PRs left a full agreement on this order unresolved. There was consensus that this ordering can be formalized.
    • A noted clarification is that xarray is not the only top of the full DAG...this consensus doesn't carry with it any prohibition on another type unwrappable by xarray that wraps other still handles types lower in the DAG.
  2. Duck arrays shouldn't be expected to define all interactions with everything, instead, they should only define operations on similar types and raise otherwise.
    • This favors the "allow list" over the "deny list" approach.
  3. A new library defining (or at least verifying/providing utilities for) a shared type resolution DAG among participating duck array libraries has been suggested and received general support. However, several implementation decisions need to be discussed.
    • Particularly, where should these definitions of interactions lie?
      • Interpreted from NEP 37 __array_module__?
      • A new slot for "handled types"?
      • Some kind of registry in this new library?

Suggested Paths Forward

Duck Array DAG Library

Discussion (in this issue hopefully) on working out the details of a shared type resolution DAG library is needed! To get the conversation started, here are the points I'm aware of that need resolution:

  • Are enough of the key duck array libraries (e.g., xarray, pint, Dask, sparse, CuPy) willing to participate in and use this shared library to make the effort worthwile?
  • pydata is the most likely home for this library, but what to name it and who should lead its maintenance?
  • (Mentioned above) Where should the definitions of pairwise interactions lie?
    • Interpreted from NEP 37 __array_module__?
    • A new slot for "handled types"?
    • Some kind of registry in this new library?
  • What role should this library have?
    • Optional checking/verification that the DAG works out
    • Enforcement of acylicness of the directed graph of interactions (which is basically the previous option but with utils to raise errors where relevant)
    • Provide full utilities that participating libraries can (or must?) use in their implementations of wrapping/binop/__array_ufunc__/__array_function__/array function modules
    • Something else?
  • How to consistently handle otherwise-unknown array types and be welcoming to any new array-like libraries that try to enter the ecosystem?

Once these are resolved, then more detailed discussions (such as API creation) can presumably take place on this new library's repo.

Changes to Participating Libraries

Libraries currently using a "deny list"/"accept all" approach (namely, xarray and pint) may need to change to an "allow list" approach to meet the community consensus, which brings with it backwards compatibility concerns. However, it makes the most sense (to me at least) to make any such changes only at the point when the aforementioned DAG library is also adopted, and at most issue warnings for unknown, but handled-for-now, types for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions