Skip to content

Make Manifests' in-memory reference structure pluggable #246

@TomNicholas

Description

@TomNicholas

Currently the ChunkManifest is hardcoded to use numpy arrays underneath to store the paths/offsets/byte ranges. However there are a few cases where we might want to use another format:

We could definitely imagine making this pluggable. The main question I have is whether the other manifest implementations should implement their own ChunkManifest class (e.g. SparseChunkManifest), i.e. ManifestArray becomes a Generic in the type of the .manifest attribute; or use virtualizarr's ChunkManifest class but wrap a different array type, i.e. ChunkManifest becomes a Generic in the type of the .paths/.offset/.lengths attributes.

Right now the latter should be pretty straightforward, but the former would require some refactoring because the ChunkManifest abstraction is leaky in that the implementation of concatenate for ManifestArrays accesses private internals of the wrapped ChunkManifest (i.e. the wrapped numpy arrays).

A related consideration is providing some kind of interface for iterating over all the references in the Manifest that doesn't make assumptions about how the references are actually stored under the hood. That's currently another place where the abstraction is a bit leaky, e.g.

renamed_paths = vectorized_rename_fn(self._paths)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions