Skip to content

Improve discoverability of backend engine options #8447

Open
@benbovy

Description

@benbovy

Is your feature request related to a problem?

Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to xr.open_dataset().

Describe the solution you'd like

The solution is similar to the one proposed in #8002 for setting a new index.

The API could look like this:

import xarray as xr

ds = xr.open_dataset(
    file_or_obj,
    engine=xr.backends.engine("myengine").with_options(
        option1=True,
        option2=100,
    ),
)

where xr.backends.engine("myengine") returns the MyEngineBackendEntrypoint subclass.

We would need to extend the API for BackendEntrypoint with a .with_options() factory method:

class BackendEntrypoint:
    _open_dataset_options: dict[str, Any]

    @classmethod
    def with_options(cls):
        """This backend does not implement `with_options`."""
        raise NotImplementedError()

Such that

class MyEngineBackendEntryPoint(BackendEntrypoint):
    open_dataset_parameters = ("option1", "option2")

    @classmethod
    def with_options(
        cls,
        option1: bool = False,
        option2: int | None = None,
    ):
        """Get the backend with user-defined options.

        Parameters
        -----------
        option1 : bool, optional
            This is option1.
        option2 : int, optional
            This is option2.

        """
        obj = cls()

        # maybe validate the given input options
        if option2 is None:
            option2 = 1

        obj._options = {"option1": option1, "option2": option2}

        return obj

    def open_dataset(
        self,
        filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
        *,
        drop_variables: str | Iterable[str] | None = None,
        **kwargs,    # no static checker error (liskov substitution principle)
    ):
        # kwargs passed directly to open_dataset take precedence to options
        # or alternatively raise an error?
        option1 = kwargs.get("option1", self._options.get("option1", False))

        ...

Pros:

  • Using .with_options(...) would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc.
  • There is no breaking change (xr.open_dataset(obj, engine=...) accepts either a string or a BackenEntryPoint subtype but not yet a BackendEntryPoint object) and this feature could be adopted progressively by existing 3rd-party backends.

Cons:

  • The possible duplicated declaration of options among open_dataset_parameters, .with_options() and .open_dataset() does not look super nice but I don't really know how to avoid that.

Describe alternatives you've considered

A BackendEntryPoint.with_options() factory is not really needed and we could just go with BackendEntryPoint.__init__() instead. Perhaps with_options looks a bit clearer and leaves room for more flexibility in __init__ , though?

Additional context

cc @jsignell stac-utils/pystac#846 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions