Skip to content

Conversation

@mslw
Copy link

@mslw mslw commented Feb 5, 2025

This is an attempt to address #269 - more of an exploration really.

It appears that while Docker 27 has no problem loading images saved with older versions, it generates the ID based on the "new style" (OCI-compliant) manifest that it would save starting with v25, and not the config file stored in the dataset. This causes DataLad to error out due to expected / observed ID mismatch.

Here, an attempt is made to generate a "new style" manifest from the old image, and use its hash as the expected ID. It is not complete (in particular, there can be different combination of save vs load versions), it is fragile (if my assumptions about the format are wrong or the format changes, the expected ID will not be correct either), but it works for at least one dataset (paper-remodnav).

With that in mind, I am also trying to think of an alternative approach. Is there an existing tool for deriving new IDs from old manifests? Can we expect older containers to be annotated with new IDs as their maintenance?

I am leaving the PR as a draft, since I do not expect to follow up immediately. However, it may be useful as at least one way of dealing with the problem.

mslw added 2 commits February 5, 2025 17:48
It appears that while Docker 27 has no problem loading images saved with
older versions, it generates the ID based on the "new style"
(OCI-compliant) manifest that it would save starting with v25, and not
the config file stored in the dataset. This causes DataLad to error out
due to ID mismatch, although the ID is most likely equivalent; see datalad#269

This commit is the first attempt to solve this issue. Since the manifest
is a structured file, an attempt is made to generate a "new" style
manifest based on the contents of the saved image, and derive the ID
from that.

The manifest needs file types, sizes, and checksums. While we could copy
checksums from the previous manifest / config, we do not seem to have
the sizes. To solve that problem, we get both through ls_file_collection
from datalad-next. This is convenient and quick, but introduces a new
dependency.

The generated structure and content are a guesswork based on reading the
OCI spec and seeing docker save output from a single container - it sure
works from that container and tries to be applicable more broadly, but
most likely won't cover more complicated cases, or those where I'm not
even sure what behavior to expect (e.g. multi-arch manifest?). Layers
are assumed to always be rootfs_diff (I currently don't know if there
are other types possible).

This commit focuses on reading older images with new Docker, and does
not address reading new images (reading images saved with Docker 26
would still fail, because it already uses the new save format which our
adapter does not expect). So the combinatorics around that will need to
be addressed later.

The new code would only trigger for Docker 27. It introduces one small
regression, where get_image_id raises a NotImplementedError for two
arguments which can be given to the old get_image.
@mslw mslw marked this pull request as draft February 5, 2025 18:43
@qlty-cloud-legacy
Copy link

Code Climate has analyzed commit 3534fbf and detected 6 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 2
Security 4

View more on Code Climate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant