Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support or recommendation for how to interact with conda-lock lockfiles #1312

Open
matthewfeickert opened this issue Oct 12, 2023 · 5 comments

Comments

@matthewfeickert
Copy link
Contributor

Proposed change

At the moment, repo2docker supports conda/mamba/micromamba environment.yml environment files as Binder config files. This is great, but even if you pin packages with == their dependencies can still float and so reproducibility into the future can break. For long term reproducible builds (e.g. launching into Binder from a Zenodo DOI) you would want to be able to also have repo2docker work with lock files. As the conda ecosystem is already supported a natural extension would be to use conda-lock, and with mamba/micromamba you can interact with conda-lock lock files on a nearly equal footing as you would an environment.yml.

However, at the moment, if you place a conda-lock lock file named environment.yml under a binder/ directory in a repo, repo2docker will fail to build from it and error with

EnvironmentSectionNotValid: The following sections on '/home/jovyan/binder/environment.yml' are invalid and will be ignored:
 - version
 - metadata
 - package

(c.f. matthewfeickert-talks/talk-pyhep-2023#5)

It would be super nice if conda-lock lock files could have support added for them as a valid repo2docker config file.

Alternative options

Though if that is too big of a feature request, it would be nice if there was a method to allow users to interact with a conda-lock lock file that works with postBuild. At the moment, if you try to have a postBuild config file that has

conda env update --file binder/conda-lock.yml --prune

this will again fail with

EnvironmentSectionNotValid: The following sections on '/home/jovyan/binder/environment.yml' are invalid and will be ignored:
 - version
 - metadata
 - package

While micromamba is able to handle a command like

micromamba install --file binder/conda-lock.yml

it seems that conda can not and so similarly having a postBuild file with

conda install --file binder/conda-lock.yml

will fail with

CondaValueError: could not parse 'version: 1' in: binder/conda-lock.yml

If the ability to install an environment from a conda-lock lock file without supporting conda-lock could be supported then if instructions on how to work with conda-lock lock files were also added this could resolve things as well.

Who would use this feature?

People that want to ensure that a Binder link will run far into the future (so maybe the same people that put things on Zenodo).

How much effort will adding it take?

I'm not sure. I would hope not much, but I haven't taken the time to look at how repo2docker currently supports all the config files it already does.

Who can do this work?

Someone with familiarity with conda-lock.

@manics
Copy link
Member

manics commented Oct 13, 2023

We already have lock files for pinning the base requirements, though these aren't yaml files:
https://github.com/jupyterhub/repo2docker/tree/main/repo2docker/buildpacks/conda
Is this a different type of lock file?

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Oct 13, 2023

@manics This might be a conda-lock version issue. The conda-lock format was unified in conda-lock v1.0.0 (c.f. conda/conda-lock#124)

https://github.com/conda/conda-lock/blob/425b384ffd010461d9a4f3c61d286e31a21f14f3/README.md?plain=1#L68-L76

By default, conda-lock store its output in conda-lock.yml in the current working directory. This file will also be used by default for render, install, and update operations. You can supply a different filename with e.g.

conda-lock --lockfile superspecial.conda-lock.yml

It seems though, that yes, the format of what you have is different. Example:

# AUTO GENERATED FROM environment.py-3.11.yml, DO NOT MANUALLY MODIFY
# Frozen on 2023-06-08 09:35:47 UTC
# Generated by conda-lock.
# platform: linux-64
# input_hash: 3896c7e12b9461937f193ac022a4426948268f5b348da74e57eeacea703149a4
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.5.7-hbcca054_0.conda#f5c65075fc34438d5b456c7f3f5ab695
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda#7aca3059a1729aa76c597603f10b0dd3
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-13.1.0-hfd8a6a1_0.conda#067bcc23164642f4c226da631f2a2e1d

compared to something like https://iris-hep.org/analysis-systems-env-nightlies/iris-hep-rc/3.11/conda-lock.yml

# This lock file was generated by conda-lock (https://github.com/conda/conda-lock). DO NOT EDIT!
#
# A "lock file" contains a concrete list of package versions (with checksums) to be installed. Unlike
# e.g. `conda env create`, the resulting environment will not change as new package versions become
# available, unless you explicitly update the lock file.
#
# Install this environment as "YOURENV" with:
#     conda-lock install -n YOURENV --file conda-lock.yml
# To update a single package to the latest version compatible with the version constraints in the source:
#     conda-lock lock  --lockfile conda-lock.yml --update PACKAGE
# To re-solve the entire environment, e.g. after changing a version constraint in the source file:
#     conda-lock -f iris-hep-rc/3.11/environment.yml --lockfile conda-lock.yml
version: 1
metadata:
  content_hash:
    linux-64: e002febb8b04300e80dded8f2b7dabb269ace11a83f98db20719007774f0f52c
  channels:
  - url: conda-forge
    used_env_vars: []
  platforms:
  - linux-64
  sources:
  - iris-hep-rc/3.11/environment.yml
package:
- name: _libgcc_mutex
  version: '0.1'
  manager: conda
  platform: linux-64
  dependencies: {}
  url: https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
  hash:
    md5: d7c89558ba9fa0495403155b64376d81
    sha256: fe51de6107f9edc7aa4f786a70f4a883943bc9d39b3bb7307c04c41410990726
  category: main
  optional: false
- name: ca-certificates
  version: 2023.7.22
  manager: conda
  platform: linux-64
  dependencies: {}
  url: https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.7.22-hbcca054_0.conda
  hash:
    md5: a73ecd2988327ad4c8f2c331482917f2
    sha256: 525b7b6b5135b952ec1808de84e5eca57c7c7ff144e29ef3e96ae4040ff432c1
  category: main
  optional: false
...

(edit)

Ah yes, here we go:

https://github.com/conda/conda-lock/blob/425b384ffd010461d9a4f3c61d286e31a21f14f3/README.md?plain=1#L57-L64

Pre 1.0 compatible usage (explicit per platform locks)

If you were making use of conda-lock before the 1.0 release that added unified lockfiles you can still get that behaviour by making use of the explicit output kind.

conda-lock --kind explicit -f environment.yml

So it seems that you're using the pre-v1.0 explicit lock file format over the v1.0+ unified lockfile.

@bollwyvl
Copy link
Contributor

Supporting conda-lock outputs would be... nice, but would likely require some guardrails, and blessing some "r2d knows best" conventions.

conda-lock itself hauls in... a lot of dependencies, so may not be a good candidate for the "base coat" environment. micromamba, already present, is certainly up to the task of consuming both formats... though pixi very well might end up "winning" for this use case.

As, ideally, it would replace (not change) the notebook environment, supporting the raw lock (in either format) would ideally be able to preflight before doing a still-expensive download by:

  • checking some marker that is compatible with the runtime (e.g. linux-64) in some non-cosmetic place
    • comments and filenames can't count
    • but really, the presence of e.g /linux-64/ in any URL would probably enough...
      • i don't think a fully /noarch/ environment is even creatable at this time for anything except a "dataset" environment
  • contain e.g. jupyterhub[singleuser] at a version that would be at least compatible with the hosting hub...
    • running micromamba install jupyterhub-singleuser could upgrade something in an undesirable way, even with a bunch of flags on it

The yml format directly supports dependencies from other package managers, like pip (and even other package manager managers like poetry and pipenv), while the @EXPLICIT format kinda half-supports them, but behind #s, so probably needs to be ignored entirely.

Thus far, there is no specific naming convention for A Well-Known Conda Lock File in a repo, as a number of "first-party" tools within the conda org don't agree on what the extension should even be:

  • the reference implementation (conda itself) does not (and might not ever) support the .yml format
    • conda env won't support an @EXPLICIT file, only conda create
  • by default, conda-lock generates lockfiles like conda-lock.yml or conda-{platform}.lock
    • both can be overriden, though the YAML format must end in .yml
  • constructor expects a .txt file for an @EXPLICIT (but has no guidance on the prefix)
    • it doesn't support the .yml format, even if micromamba is used as the solver, and would try to use it like an environment.yml

@bollwyvl
Copy link
Contributor

So, to tighten up the above as a recommendation:

  • while a file has not been selected, for each of the below opinions (in basically this order, assuming linux-64):

    .binder/conda-lock.yml
    .binder/conda-linux-64.lock
    .binder/conda-linux-64.lock.txt
    binder/conda-lock.yml
    binder/conda-linux-64.lock
    binder/conda-linux-64.lock.txt
    conda-lock.yml
    conda-linux-64.lock
    conda-linux-64.lock.txt
    
    • if the file ends in .yml
      • and linux-64 is found in #/metadata/platforms/
        • and a member of#/package/ contains name: jupyterhub-singleuser
          • select this file
    • otherwise, if the file contains /linux-64/
      • and the file contains jupyterhub-singleuser
        • select this file
  • if no file is selected, fail

  • COPY {the file} /tmp/

  • micromamba env create --prefix {wherever/it/goes} --file /tmp/{the-file} && micromamba clean -yaf

@itcarroll
Copy link

itcarroll commented Jul 24, 2024

Chiming in here with a user experience, leading to a question about the above recommendation. My goals are to keep only my project's dependencies in an environment.yml with minimal pinning, have some lockfile for protection against untested updates, and to not conflict with packages added by the conda buildpack. I do not understand how I can create or use a lockfile that is aware of the package constraints introduced in the conda buildpack. Wouldn't the recommendation, which uses create rather than update, require me to include jupyterhub-singleuser and friends with all the repo2docker constraints? If it were update though, how/where/when could I invoke conda-lock on my environment.yml and repo2docker's environment.yml?

Aside: If not for some few packages that seem to need the notebook kernel to be the same as the environment running jupyterhub-singleuser, I would have used a separate environment for my project's kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants