Open
Description
Is your feature request related to a problem?
Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to xr.open_dataset()
.
Describe the solution you'd like
The solution is similar to the one proposed in #8002 for setting a new index.
The API could look like this:
import xarray as xr
ds = xr.open_dataset(
file_or_obj,
engine=xr.backends.engine("myengine").with_options(
option1=True,
option2=100,
),
)
where xr.backends.engine("myengine")
returns the MyEngineBackendEntrypoint
subclass.
We would need to extend the API for BackendEntrypoint
with a .with_options()
factory method:
class BackendEntrypoint:
_open_dataset_options: dict[str, Any]
@classmethod
def with_options(cls):
"""This backend does not implement `with_options`."""
raise NotImplementedError()
Such that
class MyEngineBackendEntryPoint(BackendEntrypoint):
open_dataset_parameters = ("option1", "option2")
@classmethod
def with_options(
cls,
option1: bool = False,
option2: int | None = None,
):
"""Get the backend with user-defined options.
Parameters
-----------
option1 : bool, optional
This is option1.
option2 : int, optional
This is option2.
"""
obj = cls()
# maybe validate the given input options
if option2 is None:
option2 = 1
obj._options = {"option1": option1, "option2": option2}
return obj
def open_dataset(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
drop_variables: str | Iterable[str] | None = None,
**kwargs, # no static checker error (liskov substitution principle)
):
# kwargs passed directly to open_dataset take precedence to options
# or alternatively raise an error?
option1 = kwargs.get("option1", self._options.get("option1", False))
...
Pros:
- Using
.with_options(...)
would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc. - There is no breaking change (
xr.open_dataset(obj, engine=...)
accepts either a string or a BackenEntryPoint subtype but not yet a BackendEntryPoint object) and this feature could be adopted progressively by existing 3rd-party backends.
Cons:
- The possible duplicated declaration of options among
open_dataset_parameters
,.with_options()
and.open_dataset()
does not look super nice but I don't really know how to avoid that.
Describe alternatives you've considered
A BackendEntryPoint.with_options()
factory is not really needed and we could just go with BackendEntryPoint.__init__()
instead. Perhaps with_options
looks a bit clearer and leaves room for more flexibility in __init__
, though?