Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate cuda based images out of rocker-versioned2 #903

Closed
eitsupi opened this issue Jan 25, 2025 · 16 comments · Fixed by #905
Closed

Migrate cuda based images out of rocker-versioned2 #903

eitsupi opened this issue Jan 25, 2025 · 16 comments · Fixed by #905
Labels
CI pre-built images Related to pre-built images

Comments

@eitsupi
Copy link
Member

eitsupi commented Jan 25, 2025

I am fed up with the amount of questions about cuda and Python setup and the maintenance hassle, and strongly believe that users should install any version of Python using uv on the version of cuda image they want to use. (and then use rig to install and use any version of R)

The situation has changed dramatically from a few years ago when there was no rig or uv, and I think the significance of the old kind of pre-built image is declining.

@cboettig @noamross Thoughts?

@eitsupi eitsupi added CI pre-built images Related to pre-built images labels Jan 25, 2025
@eitsupi eitsupi pinned this issue Jan 25, 2025
@cboettig
Copy link
Member

makes sense to me -- that's what I've been doing for my needs, e.g. building on top of the jupyterhub cuda images. (e.g. https://github.com/boettiger-lab/k8s/blob/main/images/Dockerfile.gpu#L1 is my current gpu setup)

@cboettig
Copy link
Member

@eitsupi I'm thinking I'll drop a JupyterHub-based image into the old https://github.com/rocker-org/ml repo.

@eitsupi
Copy link
Member Author

eitsupi commented Jan 29, 2025

Thanks, that might make sense.

However, when we look here, there are multiple images for ML use. Which one is agreed to be the base image?
https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

Since it is not practical to cover all of these, I imagine it would be probably easiest to provide documentation and sample Dockerfiles explaining how to install R and RStudio on these images.

@cboettig
Copy link
Member

cboettig commented Jan 29, 2025

Yes, great points, thanks for raising these issues! I'll document these things, and I won't intend to cover all those images. You've probably noticed that actually quite few of those images include the NVIDIA CUDA libraries.

I do intend to provide a pre-built image with my recommended configuration as well, which will use the CUDA image on latest Ubuntu, as I indicate above. Jupyter's tensorflow only provides cuda latest, while tagged versions exist only for their pytorch base image. In my experience and surveys I have seen from colleagues at computing centers, pytorch is far more widely used at this time. So while I agree with you that when users look at all the images discussed there it looks intimidating, I think for this the choice I indicated above, quay.io/jupyter/pytorch-notebook:cuda12-ubuntu-24.04 makes sense.

I completely agree that we want to document how to customize this. Given the recent introduction of JupyterHub's Fancy Profiles that can build directly from a Dockerfile, it is easier than ever to bring-your-own Dockerfile (which is a natural pattern for codespaces and gitlab use as well).

There's obviously a lot of ways to set these things up, and just as Rocker has always done the rocker/ml repo will show just one opinionated way to go about it rather than something comprehensive or overly flexible; experienced users will always be able to adapt. e.g. I will go with Dirk's r2u approach, since for users writing their own Dockerfiles for a binder/jupyter experience, having it automatically solve apt dependencies is a significant win.

I know you've grown wary of all the python and cuda issues over here, so it sounds like addressing these in a different repo would be helpful too. For simplicity, the ml/ cuda image will not attempt the strong versioned promises we try and have here.

@benz0li
Copy link
Contributor

benz0li commented Feb 3, 2025

The situation has changed dramatically from a few years ago when there was no rig or uv, and I think the significance of the old kind of pre-built image is declining.

As a user, pre-built images are easier to work with than using a base image + a virtual environment manager.

IMHO containers [like the ones here] + rig/uv/other [virtual environment manager] are not meant for each other.

makes sense to me -- that's what I've been doing for my needs, e.g. building on top of the jupyterhub cuda images. (e.g. https://github.com/boettiger-lab/k8s/blob/main/images/Dockerfile.gpu#L1 is my current gpu setup)

You could also use b-data's/my CUDA-based JupyterLab R docker stack.

[...] that can build directly from a Dockerfile, it is easier than ever to bring-your-own Dockerfile (which is a natural pattern for codespaces and gitlab use as well).

Most people are simply building on existing Rocker or Jupyter images.
ℹ Like almost all of the few GPU-accelerated [Jupyter-based] images available.

@cboettig
Copy link
Member

cboettig commented Feb 3, 2025

Thanks @benz0li ! Your work is excellent as well. And yes, I totally get where you're coming from on containers vs virtual envs. I think that's definitely true for 'production containers', but perhaps a bit different for these 'dev containers' in which the goal is to support an end user customizing the environment further using patterns with which they are already familiar.

e.g. conda can be pretty cumbersome, especially when it comes to packages that require conda's 'activation' mechanism of shell shims and global env vars (e.g. as in rasterio and other gdal-binding conda packages).

However, as you already know, the official jupyter stacks are conda based, the python geospatial pangeo community is deeply conda based, and users know and expect conda. Hence the design I proposed above. This provides a concise Dockerfile that transparently extends the base Jupyter cuda image. Python installs are handled by conda. Meanwhile R installs are handled by Dirk's excellent r2u / bspm approach -- again based on user considerations. None of us think conda is a nice solution for installing R packages, but bspm handles the binary dependencies nicely (a container build time, during runtime I switch to binary installs from r-universe). In this way, a user can extend the environment with environment.yaml and install.r scripts without manually resolving lib deps.

As I noted above, this is certainly an opinionated setup, a bit different than existing setups but closely aligned with the official Jupyter images. I've tested this in a range of classroom and research settings over the past year or so alongside the other images discussed above. Moreover I think this provides a good way forward to maintain some cuda options in a separate repo in the rocker project while avoiding the headaches @eitsupi noted at top. big thanks to you both!

@benz0li
Copy link
Contributor

benz0li commented Feb 4, 2025

However, as you already know, the official jupyter stacks are conda based

Yes. That was one reason I created my own docker stacks.

Other reasons: Rocker images' use of s6-overlay and Juypter images' handling of the user's home directory1.

the python geospatial pangeo community is deeply conda based, and users know and expect conda.

People may install Conda / Mamba at user level.


Both the Version-stable Rocker images and Jupyter Docker Stacks are very popular and @eitsupi as well as @mathbunnyru do a great job improving and maintaining them.

Footnotes

  1. b-data's/my docker stacks allow for a persistent home directory that may be shared among all JupyterLab R/Python/Mojo/MAX/Julia docker images.

@benz0li
Copy link
Contributor

benz0li commented Feb 4, 2025

Regarding dev containers: (CUDA-based) Data Science dev containers
ℹ Available for R, Python, Mojo/MAX and Julia

(I am trying to serve a larger community with a unified setup)

@hute37
Copy link

hute37 commented Feb 7, 2025

I worked to have my working project with a custom extension of base ml-verse rocker image with this features:

  • pyenv based source python setup, from version in .python-version
  • pipx installed poetry based project definition with automated lock/install phase with pyproject.toml support
  • rustup rust/cargo availaibility
  • fnm node-js setup, used for full jupyter lab installation from poetry virtual env and IRKernel support
  • renv based automated R package installation, based on DESCRIPTION project specification

All running in a "rootless" Podman container, with "NVIDIA Container Toolkit" support.

References:


But ...

What I need now is to be have an image based on current NVIDIA/CUDA images:

FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04

or

FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04

Instead of the ml-verse version

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

In the past, because on my use of obsolete GPU, I had to downgrade CUDA version in the base image, with a patching script that run an apt based reinstallation of all CUDA stack

But now, that I run V100 NVIDIA (560 driver) on Ubuntu 24 with CUDA 12 installed (I also use python in native mode, without containers), what is the best approach ?

I strongly would like to avoid forking rocker project, just to patch one line, but it is a "very heavy line of code"

(I fear I can be lost in cuDNN version compatibility with tensorflow/pytorch python libs under R keras lib)

@eitsupi
Copy link
Member Author

eitsupi commented Feb 8, 2025

Thank you.

Anyway, I would like to remove all files associated with cuda from this repository as it appears that @cboettig has started work on https://github.com/rocker-org/ml and pushed the new rocker/ml image to DockerHub.
https://hub.docker.com/layers/rocker/ml/latest/images/sha256-1bfc8ec2179054ffc7d1ed87d2af1990067b85bdfbb846d13775b316f2499967

@eitsupi
Copy link
Member Author

eitsupi commented Feb 8, 2025

The situation has changed dramatically from a few years ago when there was no rig or uv, and I think the significance of the old kind of pre-built image is declining.

As a user, pre-built images are easier to work with than using a base image + a virtual environment manager.

IMHO containers [like the ones here] + rig/uv/other [virtual environment manager] are not meant for each other.

I did not say to use uv for the use of virtual environment manager.
I simply recommended that you install whatever version of Python you want.
I just don't want to guarantee that the version of Python you want is installed on the pre-built image.

@cboettig
Copy link
Member

Thanks everyone for the discussion!

@hute37 let me know if you test out our setup in rocker-org/ml, the rocker/cuda image there is building on the jupyterhub cuda12 / ubuntu 24.04 image. The recipe can easily be swapped out for one of the other official JupyterHub base images (note their pytorch series includes versioned tags for different cuda and python in the base images.

And of course if you want a solution outside of rocker-org @eitsupi & @benz0li have great suggestions above too.

@cboettig cboettig changed the title Remove cuda based images Migrate cuda based images out of rocker-versioned2 Feb 10, 2025
@cboettig
Copy link
Member

I think we should close this thread to redirect further discussion of python and cuda issues over to rocker-org/ml repo as @eitsupi has requested. @benz0li I'll be sure to link to your stack and these other options there, I'm still flushing out the readme, and always appreciate your contributions!

eitsupi added a commit that referenced this issue Feb 10, 2025
@hute37
Copy link

hute37 commented Feb 10, 2025

@cboettig

In my current environment I think I'll consider another path.
For my needs, the major value of rocker images is in their complete a full featured R images.
Not only base+tidyverse, but also full rstudio, knitr (LaTeX+pandoc) and geospatial support (very tricky to setup for system deps)

I need to consider as a traversal Mixin what is all about python, jupyter and CUDA support.

It is too fragile putting these stacks as a "base" image.

My job is to assist researcher in a statistical faculty in R and python projects setup. Every project has its needs (ML: tensorflow, pytorch, RL, etc) or R (stan, sparklyr, etc). In this context, full dependency control and long time reproducibility is strong requirement.
Project definition based on poetry/renv, with pyproject/DESCRIPTION specification is a good solution, also for unofficial packages on GitHub/GitLab, out of standard PyPi/CRAN distribution.

In my current setup, I've already included scripts for pyenv/pipx based python bootstrap and virtualenv based "jupyter lab" setup (with node-js front-end packages).

The missing point here is NVIDIA GPU support.

But this is very problematic one, if considered as a "base" image.

Some projects are CPU only, other require GPU of different architectures and different GPU capabilities. Older projects (tensorflow/RL) need to link older CUDA version (11), while new projects (pytorch) require CUDA-12

I think I'll standardize my containers on the latest (and greatest) geospatial image and I’ll try to add a (configurable) script for an (apt based?) CUDA setup.

@cboettig
Copy link
Member

Thanks @hute37 , this maps closely to my own use cases. I think the approach I have put in rocker/ml can address this quite well, (though of course there are other ways.) Rather than add the CUDA support 'on top' of geospatial, I think it's now easier to swap the base image of the 'recipe' for the desired configuration (e.g. cuda-12, cuda-11, tensorflow, cpu-only, etc).

Are your users accessing JupyterLab through a hosted jupyterhub system or downloading the docker images to their laptops? I'd love to hear more about your setup.

Want to continue the discussion over in rocker/ml ?

@hute37
Copy link

hute37 commented Feb 12, 2025

@cboettig

I'll follow up this discussion here:

I'll write some note related to this different layout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI pre-built images Related to pre-built images
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants