Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b3aee66
MV: docker adapter: Move some bits to adapters.utils
kyleam Nov 10, 2020
4274759
RF: adapters.utils: Rename _list_images to get_docker_image_ids
kyleam Nov 10, 2020
3656d42
RF: adapters: Prefer subprocess.run over check_* methods
kyleam Nov 10, 2020
b0af4b1
MV: adapters: Move logging configuration to utils
kyleam Nov 10, 2020
780247e
ENH: adapters.utils: Account for datalad logging
kyleam Nov 10, 2020
326ac7b
ENH: adapters.utils: Add helper for main() handling
kyleam Nov 10, 2020
9dbb3c5
ENH: adapters.utils: Display captured stderr on exit
kyleam Nov 10, 2020
ac5d970
NF: oci: Add adapter for working with OCI images
kyleam Nov 3, 2020
c21c480
DOC: oci: Add todo comment about image ID mismatch
kyleam Nov 3, 2020
0817007
DOC: oci: Add comment about alternative destinations
kyleam Nov 3, 2020
c91b92e
ENH: oci: Silence 'skopeo copy' output when loading image
kyleam Nov 5, 2020
104e211
ENH: oci: Add utility for parsing Docker reference
kyleam Nov 5, 2020
5445874
ENH: oci: Copy over tag when saving OCI directory
kyleam Nov 3, 2020
9e9024c
ENH: oci: Add utilities for storing and reading annotation field
kyleam Nov 5, 2020
5dcce39
ENH: oci: Record source for skopeo-copy in image's annotation
kyleam Nov 5, 2020
6634736
ENH: oci: Register a more informative name with docker-daemon
kyleam Nov 5, 2020
9e2a060
ENH: containers-add: Wire up OCI adapter
kyleam Nov 4, 2020
2309712
MV: Add utils module with containers-add's _ensure_datalad_remote
kyleam Nov 9, 2020
fc7b847
ENH: containers-add: Try to link layers in OCI directory
kyleam Nov 4, 2020
7043320
Merge remote-tracking branch 'origin/master' into skopeo
yarikoptic Sep 27, 2025
f13bcdd
BF(TST): minimally account for our migration to pytest
yarikoptic Sep 27, 2025
a6bfb55
chore: appveyor -- progress Ubuntu to 2204
yarikoptic Sep 27, 2025
038a3d1
install libffi7 since otherwise git-annex install fails
yarikoptic Sep 27, 2025
ed733e3
[DATALAD RUNCMD] chore: drop use of find_executable (use shutil.which)
yarikoptic Sep 27, 2025
afc1ea7
[release-action] Autogenerate changelog snippet for PR 277
Sep 27, 2025
ad1e343
Just a minor syntax fix spotted
yarikoptic Oct 7, 2025
f0a5cd3
Add CLAUDE.md for AI assistant guidance
yarikoptic Oct 15, 2025
33e4927
Add generic registry support to OCI adapter
yarikoptic Oct 15, 2025
0e3ee4b
Add docker.io to registry tests and verify annex URL availability
yarikoptic Oct 16, 2025
4e68a56
Add drop/get cycle test to verify remote retrieval
yarikoptic Oct 16, 2025
1124be1
Add session-wide PATH fixture to ensure sys.executable is first
yarikoptic Oct 20, 2025
11bbf45
Add comprehensive tests for alternative OCI registries and fix provid…
yarikoptic Oct 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ environment:

# All of these are common to all matrix runs ATM, so pre-defined here and to be overloaded if needed
DTS: datalad_container
APPVEYOR_BUILD_WORKER_IMAGE: Ubuntu2004
INSTALL_SYSPKGS: python3-venv xz-utils jq
APPVEYOR_BUILD_WORKER_IMAGE: Ubuntu2204
INSTALL_SYSPKGS: python3-venv xz-utils jq libffi7
# system git-annex is way too old, use better one
INSTALL_GITANNEX: git-annex -m deb-url --url http://snapshot.debian.org/archive/debian/20210906T204127Z/pool/main/g/git-annex/git-annex_8.20210903-1_amd64.deb
CODECOV_BINARY: https://uploader.codecov.io/latest/linux/codecov
Expand Down
178 changes: 178 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

datalad-container is a DataLad extension for working with containerized computational environments. It enables tracking, versioning, and execution of containerized workflows within DataLad datasets using Singularity/Apptainer, Docker, and OCI-compliant images.

## Core Architecture

### Command Suite Structure

The extension registers a command suite with DataLad through setuptools entry points (see `setup.cfg`). The main commands are:

- **containers-add** (`containers_add.py`) - Add/update container images to a dataset
- **containers-list** (`containers_list.py`) - List configured containers
- **containers-remove** (`containers_remove.py`) - Remove containers from configuration
- **containers-run** (`containers_run.py`) - Execute commands within containers

All commands are registered in `datalad_container/__init__.py` via the `command_suite` tuple.

### Container Adapters

The `adapters/` directory contains transport-specific handlers:

- **docker.py** - Docker Hub images (`dhub://` scheme)
- **oci.py** - OCI-compliant images using Skopeo (`oci:` scheme)
- Saves images as trackable directory structures
- Supports loading images to Docker daemon on-demand
- Uses Skopeo for image manipulation

Each adapter implements `save()` and `run()` functions for their respective container formats.

### Container Discovery

`find_container.py` implements the logic for locating containers:
- Searches current dataset and subdatasets
- Supports hierarchical container names (e.g., `subds/container-name`)
- Falls back to path-based and name-based lookups
- Automatically installs subdatasets if needed to access containers

### Configuration Storage

Container metadata is stored in `.datalad/config` with the pattern:
```
datalad.containers.<name>.image = <relative-path>
datalad.containers.<name>.cmdexec = <execution-format-string>
datalad.containers.<name>.updateurl = <original-url>
datalad.containers.<name>.extra-input = <additional-dependencies>
```

Default container location: `.datalad/environments/<name>/image`

## Development Commands

### Setup Development Environment

```bash
# Using uv (preferred)
uv venv
source .venv/bin/activate
uv pip install -e .[devel]

# Or traditional method
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[devel]
```

### Running Tests

```bash
# Run all tests
pytest -s -v datalad_container

# Run specific test file
pytest -s -v datalad_container/tests/test_containers.py

# Run specific test function
pytest -s -v datalad_container/tests/test_containers.py::test_add_noop

# Run with coverage
pytest -s -v --cov=datalad_container datalad_container

# Skip slow tests (marked with 'turtle')
pytest -s -v -m "not turtle" datalad_container
```

### Code Quality Tools

Pre-commit hooks are configured in `.pre-commit-config.yaml`:

```bash
# Install pre-commit hooks
pre-commit install

# Run manually on all files
pre-commit run --all-files

# Individual tools
isort datalad_container/ # Sort imports
codespell # Spell checking
```

### Building Documentation

```bash
cd docs
make html
# Output in docs/build/html/
```

### Important Testing Notes

- Tests use pytest fixtures defined in `datalad_container/conftest.py` and `tests/fixtures/`
- The project uses `@with_tempfile` and `@with_tree` decorators from DataLad's test utilities
- Docker tests may require Docker to be running
- Singularity/Apptainer tests require the container runtime to be installed
- Some tests are marked with `@pytest.mark.turtle` for slow-running tests

## Key Implementation Details

### URL Scheme Handling

Container sources are identified by URL schemes:
- `shub://` - Singularity Hub (legacy, uses requests library)
- `docker://` - Direct Singularity pull from Docker Hub
- `dhub://` - Docker images stored locally via docker pull/save
- `oci:` - OCI images stored as directories via Skopeo

The scheme determines both storage format and execution template.

### Execution Format Strings

Call format strings support placeholders:
- `{img}` - Path to container image
- `{cmd}` - Command to execute
- `{img_dspath}` - Relative path to dataset containing image
- `{img_dirpath}` - Directory containing the image
- `{python}` - Path to current Python executable

Example: `singularity exec {img} {cmd}`

### Git-annex Integration

- Large container images are managed by git-annex
- For `shub://` URLs, uses DataLad's special remote if available
- The `ensure_datalad_remote()` function (in `utils.py`) initializes the special remote when needed
- For `oci:docker://` images, registry URLs are added to annexed layers for efficient retrieval

### Path Normalization

`utils.py` contains `_normalize_image_path()` to handle cross-platform path issues:
- Config historically stored platform-specific paths
- Now standardizes to POSIX paths in config
- Maintains backward compatibility with Windows paths

## Testing Considerations

- Mark AI-generated tests with `@pytest.mark.ai_generated`
- Tests should not `chdir()` the entire process; use `cwd` parameter instead
- Use `common_kwargs = {'result_renderer': 'disabled'}` in tests to suppress output
- Many tests use DataLad's `with_tempfile` decorator for temporary test directories

## Dependencies

Core dependencies:
- datalad >= 0.18.0
- requests >= 1.2 (for Singularity Hub communication)

Container runtimes (at least one required):
- Singularity or Apptainer for Singularity images
- Docker for Docker and OCI image execution
- Skopeo for OCI image manipulation

## Version Management

This project uses `versioneer.py` for automatic version management from git tags. Version info is in `datalad_container/_version.py` (auto-generated, excluded from coverage).
6 changes: 6 additions & 0 deletions changelog.d/pr-277.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### 🚀 Enhancements and New Features

- Add skopeo-based adapter for working with OCI images.
[PR #277](https://github.com/datalad/datalad-container/pull/277) (by [@yarikoptic](https://github.com/yarikoptic))
continued an old/never finalized/closed
[PR #136](https://github.com/datalad/datalad-container/pull/136) (by [@kyleam](https://github.com/kyleam)).
55 changes: 15 additions & 40 deletions datalad_container/adapters/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,15 @@
import tarfile
import tempfile

from datalad.utils import on_windows
import logging

from datalad_container.adapters.utils import (
docker_run,
get_docker_image_ids,
log_and_exit,
on_windows,
setup_logger,
)

lgr = logging.getLogger("datalad.containers.adapters.docker")

Expand Down Expand Up @@ -49,7 +57,7 @@ def save(image, path):
with tempfile.NamedTemporaryFile() as stream:
# Windows can't write to an already opened file
stream.close()
sp.check_call(["docker", "save", "-o", stream.name, image])
sp.run(["docker", "save", "-o", stream.name, image], check=True)
with tarfile.open(stream.name, mode="r:") as tar:
if not op.exists(path):
lgr.debug("Creating new directory at %s", path)
Expand Down Expand Up @@ -79,12 +87,6 @@ def safe_extract(tar, path=".", members=None, *, numeric_owner=False):
lgr.info("Saved %s to %s", image, path)


def _list_images():
out = sp.check_output(
["docker", "images", "--all", "--quiet", "--no-trunc"])
return out.decode().splitlines()


def get_image(path, repo_tag=None, config=None):
"""Return the image ID of the image extracted at `path`.
"""
Expand Down Expand Up @@ -130,7 +132,7 @@ def load(path, repo_tag, config):
# things, loading the image from the dataset will tag the old neurodebian
# image as the latest.
image_id = "sha256:" + get_image(path, repo_tag, config)
if image_id not in _list_images():
if image_id not in get_docker_image_ids():
lgr.debug("Loading %s", image_id)
cmd = ["docker", "load"]
p = sp.Popen(cmd, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
Expand All @@ -144,7 +146,7 @@ def load(path, repo_tag, config):
else:
lgr.debug("Image %s is already present", image_id)

if image_id not in _list_images():
if image_id not in get_docker_image_ids():
raise RuntimeError(
"docker image {} was not successfully loaded".format(image_id))
return image_id
Expand All @@ -159,25 +161,7 @@ def cli_save(namespace):

def cli_run(namespace):
image_id = load(namespace.path, namespace.repo_tag, namespace.config)
prefix = ["docker", "run",
# FIXME: The -v/-w settings are convenient for testing, but they
# should be configurable.
"-v", "{}:/tmp".format(os.getcwd()),
"-w", "/tmp",
"--rm",
"--interactive"]
if not on_windows:
# Make it possible for the output files to be added to the
# dataset without the user needing to manually adjust the
# permissions.
prefix.extend(["-u", "{}:{}".format(os.getuid(), os.getgid())])

if sys.stdin.isatty():
prefix.append("--tty")
prefix.append(image_id)
cmd = prefix + namespace.cmd
lgr.debug("Running %r", cmd)
sp.check_call(cmd)
docker_run(image_id, namespace.cmd)


def main(args):
Expand Down Expand Up @@ -228,20 +212,11 @@ def main(args):

namespace = parser.parse_args(args[1:])

logging.basicConfig(
level=logging.DEBUG if namespace.verbose else logging.INFO,
format="%(message)s")
setup_logger(logging.DEBUG if namespace.verbose else logging.INFO)

namespace.func(namespace)


if __name__ == "__main__":
try:
with log_and_exit(lgr):
main(sys.argv)
except Exception as exc:
lgr.exception("Failed to execute %s", sys.argv)
if isinstance(exc, sp.CalledProcessError):
excode = exc.returncode
else:
excode = 1
sys.exit(excode)
Loading