-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Currently the ChunkManifest
is hardcoded to use numpy arrays underneath to store the paths/offsets/byte ranges. However there are a few cases where we might want to use another format:
- sparse grids of chunks
- "algorithmically inflatable" grids of chunks (see Add hypergrib as as a grib reader #238)
- using even more efficient reference storage formats (Rewrite manifest logic in Rust? #23)
We could definitely imagine making this pluggable. The main question I have is whether the other manifest implementations should implement their own ChunkManifest
class (e.g. SparseChunkManifest
), i.e. ManifestArray
becomes a Generic
in the type of the .manifest
attribute; or use virtualizarr's ChunkManifest
class but wrap a different array type, i.e. ChunkManifest
becomes a Generic
in the type of the .paths
/.offset
/.lengths
attributes.
Right now the latter should be pretty straightforward, but the former would require some refactoring because the ChunkManifest
abstraction is leaky in that the implementation of concatenate
for ManifestArrays
accesses private internals of the wrapped ChunkManifest
(i.e. the wrapped numpy arrays).
A related consideration is providing some kind of interface for iterating over all the references in the Manifest that doesn't make assumptions about how the references are actually stored under the hood. That's currently another place where the abstraction is a bit leaky, e.g.
renamed_paths = vectorized_rename_fn(self._paths) |