Add skopeo-based adapter for working with OCI images #277

yarikoptic · 2025-09-27T13:04:34Z

Reincarnated Kyle's

Add skopeo-based adapter for working with OCI images #136

since I want (almost need) to store OCI images and from them then generate SIF files.

Observations

works good with image pointing to subdataset

TODOs:

review addition of URLs since seems to not add for e.g. oci:docker://quay.io/singularity/singularity:v3.9.0-slim; ATM relies on hardcoded +_ENDPOINTS = {"docker.io": "https://registry-1.docker.io/v2/"} but should be generic and skopeo should know...
some registries would require Bearer token auth. We have some implementation in stock datalad but with a warning. The point is that we might need to add them so they below to datalad special remote. For now enabled datalad special remote for all
TODO (later/may be): need to add datalad special remote providers for other registries since they all need bearer token ... since not in stock datalad ATM, I guess we better do it dynamically at run time ATM somehow.

tested - and it seems to work just fine with our added warning here although we might even want to get rid of it

❯ datalad containers-add --url oci:docker://gcr.io/google-containers/busybox:latest test-2

Getting image source signatures
Copying blob a3ed95caeb02 done  
Copying blob a3ed95caeb02 done  
Copying blob 138cfc514ce4 done  
Copying blob a3ed95caeb02 skipped: already exists  
Copying config a8abf0c769 done   | 
Writing manifest to image destination
add(ok): .datalad/environments/test-2/image/blobs/sha256/138cfc514ce4b3f1f8d57b2f9766fcb5ffab791110bcd8610e8d762cc78d28b2 (file)                                                                                               
add(ok): .datalad/environments/test-2/image/blobs/sha256/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 (file)                                                                                               
add(ok): .datalad/environments/test-2/image/blobs/sha256/a8abf0c7690539c07cf95d28ec8e66922288dc26869569304a2ba3a2d5a78540 (file)                                                                                               
add(ok): .datalad/environments/test-2/image/blobs/sha256/d2b1a07a7a73df9fcab3cedafd00fd548345f58efa52680fab80b612d061b534 (file)                                                                                               
add(ok): .datalad/environments/test-2/image/index.json (file)                                                                                                                                                                  
add(ok): .datalad/environments/test-2/image/oci-layout (file)                                                                                                                                                                  
add(ok): .datalad/config (file)                                                                                                                                                                                                
save(ok): . (dataset)                                                                                                                                                                                                          
action summary:                                                                                                                                                                                                                
  add (ok: 7)
  save (ok: 1)
add(ok): .datalad/environments/test-2/image/blobs/sha256/138cfc514ce4b3f1f8d57b2f9766fcb5ffab791110bcd8610e8d762cc78d28b2 (file)
add(ok): .datalad/environments/test-2/image/blobs/sha256/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 (file)
add(ok): .datalad/environments/test-2/image/blobs/sha256/a8abf0c7690539c07cf95d28ec8e66922288dc26869569304a2ba3a2d5a78540 (file)
add(ok): .datalad/environments/test-2/image/blobs/sha256/d2b1a07a7a73df9fcab3cedafd00fd548345f58efa52680fab80b612d061b534 (file)
add(ok): .datalad/environments/test-2/image/index.json (file)
add(ok): .datalad/environments/test-2/image/oci-layout (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
containers_add(ok): /tmp/test-oci/.datalad/environments/test-2/image (file)
[WARNING] Required Datalad provider configuration for Docker registry links not detected. We will enable 'datalad' special remote anyways but datalad might issue warnings later on. 
action summary:
  add (ok: 7)
  containers_add (ok: 1)
  save (ok: 1)
❯ git annex drop --all
drop MD5E-s1142686--ced8b461027cb2e2ee11c9ad670b749b ok
drop MD5E-s32--54a01009f17bdb7ec1dd1cb427244304 ok
drop MD5E-s783033--f1a49b4fd6d4fce7dbac8c4694672706 ok
drop MD5E-s480923--008740f932de66855019ca24fbceef1b ok
(recording state in git...)
❯ datalad get -J5 .
get(ok): .datalad/environments/test-2/image/blobs/sha256/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 (file) [from datalad...]                                                                             
get(ok): .datalad/environments/test-1/image/blobs/sha256/1617e25568b2231fdd0d5caff63b06f6f7738d8d961f031c80e47d35aaec9733 (file) [from datalad...]                                                                             
get(ok): .datalad/environments/test-1/image/blobs/sha256/9fa9226be034e47923c0457d916aa68474cdfb23af8d4525e9baeebc4760977a (file) [from datalad...]
get(ok): .datalad/environments/test-2/image/blobs/sha256/138cfc514ce4b3f1f8d57b2f9766fcb5ffab791110bcd8610e8d762cc78d28b2 (file) [from datalad...]
action summary:
  get (ok: 4)

such python code shows how to check registry for needing auth token

import requests
import json
from urllib.parse import urlparse, parse_qs
from typing import Optional, Dict

class AuthenticatedBlobDownloader:
    """Download OCI blobs with proper registry authentication"""
    
    def __init__(self):
        self.session = requests.Session()
        self.token_cache = {}  # Cache tokens per registry/repo
    
    def get_auth_token(self, registry: str, repository: str, 
                       username: Optional[str] = None, 
                       password: Optional[str] = None) -> Optional[str]:
        """
        Get authentication token for a registry using the OAuth2 flow.
        Returns token for anonymous access to public repos, or authenticated access.
        """
        cache_key = f"{registry}/{repository}"
        if cache_key in self.token_cache:
            return self.token_cache[cache_key]
        
        # Step 1: Probe the /v2/ endpoint to get WWW-Authenticate header
        probe_url = f"https://{registry}/v2/"
        response = self.session.get(probe_url)
        
        if response.status_code == 200:
            # No auth required
            return None
        
        if response.status_code != 401:
            raise ValueError(f"Unexpected response from registry: {response.status_code}")
        
        # Step 2: Parse WWW-Authenticate header
        www_auth = response.headers.get('WWW-Authenticate', '')
        if not www_auth.startswith('Bearer'):
            raise ValueError(f"Unsupported auth scheme: {www_auth}")
        
        # Parse realm, service, scope from header
        # Example: Bearer realm="https://ghcr.io/token",service="ghcr.io",scope="repository:user/repo:pull"
        auth_params = {}
        for part in www_auth.replace('Bearer ', '').split(','):
            if '=' in part:
                key, value = part.split('=', 1)
                auth_params[key.strip()] = value.strip('"')
        
        realm = auth_params.get('realm')
        service = auth_params.get('service')
        
        if not realm:
            raise ValueError(f"No realm in WWW-Authenticate: {www_auth}")
        
        # Step 3: Request token from auth endpoint
        token_params = {
            'service': service,
            'scope': f'repository:{repository}:pull'
        }
        
        if username and password:
            # Authenticated request
            token_response = self.session.get(
                realm, 
                params=token_params,
                auth=(username, password)
            )
        else:
            # Anonymous request (works for public repos)
            token_response = self.session.get(realm, params=token_params)
        
        token_response.raise_for_status()
        token_data = token_response.json()
        
        token = token_data.get('token') or token_data.get('access_token')
        if not token:
            raise ValueError(f"No token in response: {token_data}")
        
        # Cache the token
        self.token_cache[cache_key] = token
        return token
    
    def download_blob(self, blob_url: str, output_path: str,
                     username: Optional[str] = None,
                     password: Optional[str] = None) -> str:
        """
        Download a blob from a registry with proper authentication.
        
        Args:
            blob_url: Full URL like https://ghcr.io/v2/repo/blobs/sha256:...
            output_path: Where to save the downloaded blob
            username: Optional username for private repos
            password: Optional password/token for private repos
        
        Returns:
            Path to downloaded file
        """
        # Parse the blob URL
        parsed = urlparse(blob_url)
        registry = parsed.netloc
        
        # Extract repository from path: /v2/REPO/blobs/DIGEST
        path_parts = parsed.path.split('/')
        if len(path_parts) < 5 or path_parts[1] != 'v2' or path_parts[-2] != 'blobs':
            raise ValueError(f"Invalid blob URL format: {blob_url}")
        
        # Repository is everything between /v2/ and /blobs/
        repository = '/'.join(path_parts[2:-2])
        digest = path_parts[-1]
        
        print(f"Downloading from {registry}/{repository}")
        print(f"  Digest: {digest}")
        
        # Get authentication token
        token = self.get_auth_token(registry, repository, username, password)
        
        # Download the blob
        headers = {}
        if token:
            headers['Authorization'] = f'Bearer {token}'
            print(f"  Using auth token: {token[:30]}...")
        else:
            print("  No authentication required")
        
        response = self.session.get(blob_url, headers=headers, stream=True)
        response.raise_for_status()
        
        # Save to file
        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
        
        print(f"  ✓ Downloaded to {output_path}")
        return output_path


# Usage example
if __name__ == "__main__":
    downloader = AuthenticatedBlobDownloader()
    
    # Example 1: Download from ghcr.io (public repo)
    blob_url = "https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf"
    downloader.download_blob(blob_url, "layer.tar.gz")
    
    # Example 2: Download from a private repo (with credentials)
    # downloader.download_blob(
    #     "https://ghcr.io/v2/myorg/private-repo/blobs/sha256:...",
    #     "private-layer.tar.gz",
    #     username="myuser",
    #     password="ghp_myPersonalAccessToken"
    # )

python code to potentially get endpoints programmatically although it remains needing adhoc fix for docker hub

import subprocess
import json

def get_layer_urls_from_skopeo(image_ref: str) -> dict:
    """
    Extract layer URLs with ZERO hardcoding.
    Trust skopeo's normalized Name field as the source of truth.
    """
    if not image_ref.startswith("docker://"):
        image_ref = f"docker://{image_ref}"
    
    # Let skopeo do all the work
    result = subprocess.run(
        ["skopeo", "inspect", image_ref],
        capture_output=True, text=True, check=True
    )
    data = json.loads(result.stdout)
    
    # Skopeo's Name field is the authoritative source
    name = data["Name"]
    parts = name.split('/', 1)
    registry = parts[0]
    repository = parts[1] if len(parts) > 1 else ""
    
    # THE KEY INSIGHT: For most registries, registry hostname = API endpoint
    # The ONLY common exception is docker.io -> registry-1.docker.io
    # But we can detect this by checking if skopeo used docker.io or index.docker.io
    
    # If it's docker.io or index.docker.io, the API is at registry-1.docker.io
    # This is the ONE hardcoded fact about Docker Hub's architecture
    if registry in ["docker.io", "index.docker.io"]:
        api_host = "registry-1.docker.io"
    else:
        # For ALL other registries: hostname = API endpoint
        api_host = registry
    
    provenance = {
        "name": name,
        "registry": registry,
        "repository": repository,
        "api_endpoint": api_host,
        "digest": data["Digest"],
        "layers": [
            {
                "digest": layer["Digest"],
                "size": layer["Size"],
                "url": f"https://{api_host}/v2/{repository}/blobs/{layer['Digest']}"
            }
            for layer in data.get("LayersData", [])
        ]
    }
    
    return provenance


# Test
for img in [
    "ghcr.io/con/nwb2bids:v0.5.0",
    "quay.io/singularity/singularity:v3.9.0-slim",
    "ubuntu:latest"
]:
    print(f"\n{'='*60}\n{img}")
    prov = get_layer_urls_from_skopeo(img)
    print(f"Registry: {prov['registry']}")
    print(f"API Endpoint: {prov['api_endpoint']}")
    print(f"Blob URL: {prov['layers'][0]['url']}")

❯ datalad download-url https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf
[INFO   ] Downloading 'https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf' into '/tmp/' 
Access to https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf has failed.
Would you like to setup a new provider configuration to access url? (choices: [yes], no): yes

New provider name
Unique name to identify 'provider' for https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf [ghcr.io]: 

New provider regular expression
A (Python) regular expression to specify for which URLs this provider should be used [https://ghcr\.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf]: https://ghcr\.io/v2/.*

Authentication type
What authentication type to use (choices: aws-s3, bearer_token, bearer_token_anon, html_form, http_auth, http_basic_auth, http_digest_auth, http_token, loris-token, nda-s3, none, xnat): bearer_token_anon 

Credential
What type of credential should be used? (choices: aws-s3, git, loris-token, nda-s3, [token], user_password): 

Save provider configuration file
Following configuration will be written to /home/yoh/.config/datalad/providers/ghcr.io.cfg:
# Provider configuration file created to initially access
# https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf

[provider:ghcr.io]
url_re = https://ghcr\.io/v2/.*
authentication_type = bearer_token_anon
# Note that you might need to specify additional fields specific to the
# authenticator.  Fow now "look into the docs/source" of <class 'datalad.downloaders.http.HTTPAnonBearerTokenAuthenticator'>
# bearer_token_anon_
credential = ghcr.io

[credential:ghcr.io]
# If known, specify URL or email to how/where to request credentials
# url = ???
type = token
 (choices: [yes], no): yes

You need to authenticate with 'ghcr.io' credentials.
token: 
[WARNING] Argument 'credential' specified, but it will be ignored: Token(auth_url=<<'https://ghcr.++93 chars++cdf'>>, ds=None, name='ghcr.io', url=None) 
download_url(ok): /tmp/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf (file)                                                                                                                          
❯ file sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf
sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf: gzip compressed data, was "rootfs.tar", max compression, from Unix, original size modulo 2^32 81039360
❯ rm sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf
❯ datalad download-url https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf
[INFO   ] Downloading 'https://ghcr.io/v2/con/nwb2bids/blobs/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf' into '/tmp/' 
[WARNING] Argument 'credential' specified, but it will be ignored: Token(auth_url=<<'https://ghcr.++93 chars++cdf'>>, ds=None, name='ghcr.io', url=None) 
download_url(ok): /tmp/sha256:8c7716127147648c1751940b9709b6325f2256290d3201662eca2701cadb2cdf (file)

These parts will be useful for the upcoming skopeo adapter as well.

"get" is probably clearer than "list", and tacking on "_ids" makes it clearer what the return value is. Also, drop the leading underscore, which is a holdover from the function being in the adapters.docker module.

The minimum Python version of DataLad is new enough that we can assume subprocess.run() is available. It's recommended by the docs, and I like it more, so switch to it. Note that we might want to eventually switch to using WitlessRunner here. The original idea with using the subprocess module directly was that it'd be nice for the docker adapter to be standalone, as nothing in the adapter depended on datalad at the time. That's not the case anymore after the adapters.utils split and the use of datalad.utils within it. (And the upcoming skopeo adapter will make heavier use of datalad for adding URLs to the layers.)

This logic will get a bit more involved in the next commit, and it will be needed by the skopeo adapter too.

When the adapter is called from the command line (as containers-run does) and datalad gets imported, the level set via the --verbose argument doesn't have an effect and logging happens twice, once through datalad's handler and once through the adapter's. Before 313c4f0 (WIN/Workaround: don't pass gid and uid to docker run call, 2020-11-10), the above was the case when docker.main() was triggered with the documented `python -m datalad_container.adapters ...` invocation, but not when the script path was passed to python. Following that commit, the adapter imports datalad, so datalad's logger is always configured. Adjust setup_logger() to set the log level of loggers under the datalad.containers.adapters namespace so that the adapter's logging level is in effect for command line calls to the adapter. As mentioned above, datalad is now loaded in all cases, so a handler is always configured, but, in case this changes in the future, add a simpler handler if one isn't already configured.

The same handling will be needed in the skopeo adapter. Avoid repeating it.

Some of the subprocess calls capture stderr. Show it to the caller on failure.

In order to be able to track Docker containers in a dataset, we introduced the docker-save-based docker adapter in 68a1462 (Add prototype of a Docker adapter, 2018-05-18). It's not clear how much this has been used, but at least conceptually it seems to be viable. One problem, however, is that ideally we'd be able to assign Docker registry URLs to the image files stored in the dataset (particularly the large non-configuration files). There doesn't seem to be a way to do this with the docker-save archives. Another option for storing the image in a dataset is the Open Container Initiative image format. Skopeo can be used to copy images in Docker registries (and some other destinations) to an OCI-compliant directory. When Docker Hub is used as the source, the resulting layers blobs can be re-obtained via GET /v2/NAME/blobs/ID. Using skopeo/OCI also has the advantage of making it easier to execute via podman in the future. Add an initial skopeo-based OCI adapter. At this point, it has the same functionality as the docker adapter.

After running `skopeo copy docker://docker.io/... oci:<dir>`, we can link up the layer to the Docker registry. However, other digests aren't preserved. One notable mismatch is between the image ID if you run docker pull x versus skopeo copy docker://x oci:x && skopeo copy oci:x docker-daemon:x I haven't really wrapped my head around all the different digests and when they can change. However, skopeo's issue tracker has a good deal of discussion about this, and it looks complicated (e.g., issues 11, 469, 949, 1046, and 1097). The adapter docstring should probably note this, though at this point I'm not sure I could say something coherent. Anyway, add a to-do note...

I _think_ containers-storage: is what we'd use for podman-run, but I haven't attempted it.

Prevent skopeo-copy output from being shown, since it's probably confusing to see output under run's "Command start (output follows)" tag for a command that the user didn't explicitly call. However, for large images, this has the downside that the user might want some signs of life, so this may need to be revisited.

We'll need this information in order to add a tag to the oci: destination and to make the entry copied to docker-daemon more informative. I've tried to base the rules on containers/image implementation, which is what skopeo uses underneath.

An image stored as an OCI directory can have a tag. If the source has a tag specified, copy it over to the destination. Note that in upcoming commits will store the full source specification as an image annotation, so we won't rely on this when copying the image to docker-daemon:, but it still seems nice to have (e.g., when looking at the directory with skopeo-inspect).

These will be used to store the value of the skopeo-copy source and then retrieve it at load time to make the docker-daemon: entry more informative.

The OCI format allows annotations. Add one with the source value (which will be determined by what the caller gives to containers-add) so that we can use this information when copying the information to a docker-daemon: destination.

The images copied to the daemon look like this $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE datalad-container/bb sha256-98345e4 98345e418eb7 3 weeks ago 69.2MB That tag isn't useful because it just repeats the image ID. And the name after "datalad-container/" is the name of the directory, so with the default containers-add location it would be an uninformative "image". With the last commit, we store the source specification as an annotation in the OCI directory. Parse it and reuse the original repository name and tag. REPOSITORY TAG IMAGE ID CREATED SIZE datalad-container/debian buster-slim 98345e418eb7 3 weeks ago 69.2MB If the source has a digest instead of the tag, construct the daemon tag from that.

Add a new oci: scheme. The stacking of the schemes isn't ideal (oci:docker://, oci:docker-daemon:), but it allows for any skopeo transport to be used. Note: I'm not avoiding appending "//" for a conceptual reason (although there might be a valid one), but because I find "oci://docker://" to be ugly. Perhaps the consistency with "shub://" and "dhub://" outweighs that though.

The next commit will use this logic in the oci adapter as well, and, it'd be nice (though not strictly necessary) to avoid oci and containers_add importing each other.

TODO: Finalize approach in Datalad for Docker Registry URLs.

* origin/master: (217 commits) [DATALAD RUNCMD] Run pre-commit to harmonize code throughout Update __version__ to 1.2.6 [skip ci] Update CHANGELOG BF: use setuptools.errors.OptionError instead of now removed import of distutils.DistutilsOptionError BF: docbuild - use python 3.9 (not 3.8) and upgrade setuptools [DATALAD RUNCMD] Run pre-commit to harmonize code throughout rm duplicate .codespellrc and move some of its skips into pyproject.toml progress codespell in pre-commit Add precommit configuration as in datalad ATM [release-action] Autogenerate changelog snippet for PR 268 MNT: Account for a number of deprecations in core Revert linting a target return value for a container Fix lint errors other than line length upper case CWD acronym CI/tools: Add fuse2fs dependency for singularity installation Improving documentation for --url parameter Update __version__ to 1.2.5 [skip ci] Update CHANGELOG Add changelog entry for isort PR [DATALAD RUNCMD] isort all files for consistency ... Conflicts - some were tricky: datalad_container/adapters/docker.py datalad_container/containers_add.py datalad_container/utils.py - both added but merge looked funny

otherwise even singularity does not install

=== Do not change lines below === { "chain": [], "cmd": "sed -i -e 's,from distutils.spawn import find_executable,from shutil import which,g' -e 's,find_executable(,which(,g' datalad_container/adapters/tests/test_oci_more.py", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^

codecov · 2025-09-27T14:29:56Z

Codecov Report

❌ Patch coverage is 56.78670% with 156 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.65%. Comparing base (89aab60) to head (11bbf45).
⚠️ Report is 14 commits behind head on master.

Files with missing lines	Patch %	Lines
datalad_container/adapters/oci.py	42.85%	88 Missing ⚠️
datalad_container/adapters/tests/test_oci_more.py	18.18%	54 Missing ⚠️
datalad_container/adapters/utils.py	87.17%	5 Missing ⚠️
datalad_container/conftest.py	66.66%	5 Missing ⚠️
datalad_container/containers_add.py	60.00%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #277       +/-   ##
===========================================
- Coverage   94.60%   83.65%   -10.95%     
===========================================
  Files          24       28        +4     
  Lines        1112     1444      +332     
===========================================
+ Hits         1052     1208      +156     
- Misses         60      236      +176

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yarikoptic · 2025-09-28T03:30:25Z

I guess it is too much of crippled system, I will move those commits into a separate PR, no point to occlude here.

Added comprehensive documentation for Claude Code to work effectively with this codebase, including architecture overview, development commands, and key implementation details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Extended the OCI adapter to support any container registry without hardcoding endpoints. The link() function now dynamically constructs registry API endpoints using the pattern https://{registry}/v2/, with Docker Hub as the only special case (registry-1.docker.io). This enables automatic support for registries like: - quay.io (Quay.io registry) - gcr.io (Google Container Registry) - ghcr.io (GitHub Container Registry) - Any other V2-compatible registry Changes: - Removed hardcoded _ENDPOINTS dictionary - Added dynamic endpoint construction in link() function - Added unit tests for parsing references from alternative registries - Added integration tests using real images: - ghcr.io/astral-sh/uv:latest for ghcr.io testing - quay.io/linuxserver.io/baseimage-alpine:3.18 for quay.io testing The link() function will add registry URLs to annexed layer images for any registry when proper provider configuration is available, enabling efficient retrieval through git-annex. All new tests are marked with @pytest.mark.ai_generated as per project standards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Enhanced the parametrized registry test to include: 1. Docker Hub (docker.io) with busybox:1.30 for consistency 2. Verification that annexed blobs exist in the OCI image 3. Check that all annexed files have URLs registered in either the datalad or web remote for efficient retrieval The test now verifies that `git annex find --not --in datalad --and --not --in web` returns empty, ensuring all blobs are accessible through git-annex remotes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Enhanced the parametrized registry test to verify the complete drop/get cycle for the entire dataset: 1. Drops all annexed content in the dataset 2. Verifies that files were actually dropped (non-empty results) 3. Gets everything back from remotes 4. Verifies that files were retrieved (non-empty results) This ensures that the registered URLs in datalad/web remotes are functional and files can be successfully retrieved from the registry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

This fixture ensures that sys.executable's directory is first in PATH for the duration of tests. This is needed when tests spawn subprocesses that need to import modules from the same Python environment that's running pytest, preventing "No module named X" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…er handling - Add parametrized integration test covering docker.io, gcr.io, and quay.io - Test container addition, execution, and annexed blob verification - Add drop/get cycle testing to verify remote retrieval works - Fix link() to create datalad remote even without provider configuration - Issue warning instead of skipping when provider not found - Allows URLs to be registered and files to be retrieved from any registry - Use pytest tmp_path fixture instead of @with_tempfile decorator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

yarikoptic · 2025-10-20T13:37:41Z

@asmacdo try to use it with containers which might be of interest to you.

kyleam and others added 24 commits December 4, 2020 16:10

MV: docker adapter: Move some bits to adapters.utils

b3aee66

These parts will be useful for the upcoming skopeo adapter as well.

RF: adapters.utils: Rename _list_images to get_docker_image_ids

4274759

"get" is probably clearer than "list", and tacking on "_ids" makes it clearer what the return value is. Also, drop the leading underscore, which is a holdover from the function being in the adapters.docker module.

MV: adapters: Move logging configuration to utils

b0af4b1

This logic will get a bit more involved in the next commit, and it will be needed by the skopeo adapter too.

ENH: adapters.utils: Add helper for main() handling

326ac7b

The same handling will be needed in the skopeo adapter. Avoid repeating it.

ENH: adapters.utils: Display captured stderr on exit

9dbb3c5

Some of the subprocess calls capture stderr. Show it to the caller on failure.

DOC: oci: Add comment about alternative destinations

0817007

I _think_ containers-storage: is what we'd use for podman-run, but I haven't attempted it.

ENH: oci: Add utilities for storing and reading annotation field

9e9024c

These will be used to store the value of the skopeo-copy source and then retrieve it at load time to make the docker-daemon: entry more informative.

MV: Add utils module with containers-add's _ensure_datalad_remote

2309712

The next commit will use this logic in the oci adapter as well, and, it'd be nice (though not strictly necessary) to avoid oci and containers_add importing each other.

ENH: containers-add: Try to link layers in OCI directory

fc7b847

TODO: Finalize approach in Datalad for Docker Registry URLs.

BF(TST): minimally account for our migration to pytest

f13bcdd

chore: appveyor -- progress Ubuntu to 2204

a6bfb55

otherwise even singularity does not install

install libffi7 since otherwise git-annex install fails

038a3d1

yarikoptic added CHANGELOG-missing minor Increment the minor version when merged labels Sep 27, 2025

github-actions bot removed the CHANGELOG-missing label Sep 27, 2025

yarikoptic force-pushed the skopeo branch from adcb60b to 8f2ea83 Compare September 27, 2025 15:53

yarikoptic mentioned this pull request Sep 28, 2025

Upgrade appveyor to freshier base Ubuntu 22.04 to pass testing #278

Merged

[release-action] Autogenerate changelog snippet for PR 277

afc1ea7

yarikoptic force-pushed the skopeo branch from 295b22f to afc1ea7 Compare September 28, 2025 03:36

Just a minor syntax fix spotted

ad1e343

This was referenced Oct 10, 2025

not all components (blobs) got URL for oci: import #280

Open

Define datalad.containers-run.oci-runtime #281

Open

Design (+implementation) to switch to support/use OCI containers to produce singularity images ReproNim/containers#153

Open

yarikoptic and others added 6 commits October 15, 2025 09:43

yarikoptic requested review from asmacdo and mih October 20, 2025 13:37

yarikoptic marked this pull request as ready for review October 20, 2025 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add skopeo-based adapter for working with OCI images #277

Add skopeo-based adapter for working with OCI images #277

Uh oh!

yarikoptic commented Sep 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 27, 2025 •

edited

Loading

Uh oh!

yarikoptic commented Sep 28, 2025

Uh oh!

yarikoptic commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add skopeo-based adapter for working with OCI images #277

Are you sure you want to change the base?

Add skopeo-based adapter for working with OCI images #277

Uh oh!

Conversation

yarikoptic commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yarikoptic commented Sep 28, 2025

Uh oh!

yarikoptic commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yarikoptic commented Sep 27, 2025 •

edited

Loading

codecov bot commented Sep 27, 2025 •

edited

Loading