Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

Open
hute37 opened this issue Feb 12, 2025 · 10 comments

Comments

@hute37
Copy link

hute37 commented Feb 12, 2025

This note present an alternative approach that can be used to integrate R and Python stacks with a different container layout
In this alternative layout, python features are introduced after R stack.
This choice has the important implication of enable standard Rocker images (r-base, verse, geospatial) as a base image, avoiding R setup replication.

This discussion is a follow up of some comments in issue:

related to deprecation of CUDA enabled images:

For a previous related discussion on CUDA support versioning, see also:

A brief presentation of this alternative layout in the following comments

@eddelbuettel eddelbuettel changed the title Alternative containers hierarcy layout for R and Python integration in CUDA enabled projects Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects Feb 12, 2025
@hute37
Copy link
Author

hute37 commented Feb 12, 2025

The solution described is implemented in this two "template" projects, used for quick setup of R and python research projects, in a heterogeneous environment as a statistics department can be.

In this context, some features are important:

  • full control of python/R dependency inclusion. It is often required to support inclusion of packages out of standard distributions PyPI or CRAN. Sometimes the package is in development state, on some public source repository. In other cases the packages are removed from official distribution because of lacking of maintainers
  • project reproducibility on different machines, at different times, with a stable set of resolved dependencies
  • variable support of GPU architecture. During its lifetime, a project can have different requirements and GPU availability evolve in time.
  • unprivileged access to computational machines. This requirement is satisfied by "rootless" execution mode, provided by Podman runtime.

The reference template projects are:

Both project support container execution and are based on standard Rocker images

The R dependencies are managed by renv, with full dependency specification in DESCRIPTION file

The python part is not inherited by rocker containers but is managed in two stages:

  • "build setup phase": where the python setup is build in container image, with pyenv, pipx, poetry setup scripts

  • "runtime setup phase": where a python virtual environment is created and loaded by poetry, following project definition.

this is automated by this script: /docker/r-images/scripts/setup/setup_ubs-all.sh

This script, after virtualenv package installation, prepare jupyter lab environment by pulling node-js modules required by front end language support. It also register virtualenv for reticulate use and IRKernel as a jupyter kernel. renv installation is also triggered

An important note: the splitting of "setup" procedure in two different phases (build, runtime) is requested by the different nature of "storage" class visible:

  • "build phase" can interact only with container image, that will be immutable, after the initial build
  • "runtime phase" cannot change "immutable" image storage, but see a mounted writable and persistent directory where all the dynamical parts will be stored. This directory plays the role of "user home" directory and will contain python virtualenv and renv/rstudio installation path.

The detail of this volume mapping are defined in this Makefile, that holds all the podman interactions

In the next comment, a diagram of the project architecture

@hute37
Copy link
Author

hute37 commented Feb 12, 2025

Image

project layout: drawio source

@cboettig
Copy link
Member

@hute37 Very cool, thanks for sharing. It seems like you have a nice workflow for adding your cuda and python needs on top of the existing rocker/versioned2 stack, which is lovely.

In the thread from 903 you mentioned that you were looking for a solution that supported cuda 12, right? Above you mention the need to support GPU architectures, but from a quick skim it's not clear how you handle that. I did see your cuda dir with scripts of considerable complexity there already, so maybe you already have a good solution here?

Just want to make sure I'm understanding if there's a question in here somewhere or just sharing a different approach that is ready and working?

@benz0li
Copy link

benz0li commented Feb 12, 2025

Cross reference regarding Python installation in the Rocker images:

@benz0li
Copy link

benz0li commented Feb 12, 2025

With pre-built CUDA-based R + Python images one must pin versions to prevent breaking changes. For example

  • R v4.3 images: Pin CUDA to v11 and Python to v3.11
  • R v4.4 images: Pin CUDA to v12 and Python to v3.12

One also has to consider when the base image (ubuntu) is supposed to be updated to a new LTS release.

@benz0li
Copy link

benz0li commented Feb 12, 2025

@hute37 Regarding rocker-org/rocker-versioned2#903 (comment), i.e. new projects (PyTorch) requiring CUDA 12:

PyTorch installs its own CUDA binaries/libraries by default. It does not depend on the CUDA version of the image. It only depends on the NVIDIA driver version of the host.

Originally posted by @benz0li in iot-salzburg/gpu-jupyter#153 (comment)

See also:

@hute37
Copy link
Author

hute37 commented Feb 12, 2025

@cboettig

CUDA-12 support is not ready yet. I think I'll follow the same path I followed when I had to downgrade ml-verse CUDA support in order to support our old (Azure based) NVIDIA K80 GPU, with legacy 470 nvidia driver.

In this version, the project templates are sill using CUDA-11 support from latest ml-verse container.

While this seems to work in R (keras) container tests, this is a problem for our current python environment.

Python based projects (derived from dve-sample-py) also support native ("un-contenerized") pyenv/poetry execution.

In this case, I prepare the virtual machines following a manual CUDA setup.
Now we are standardized on CUDA-12 (nvidia-560) setup on Ubuntu 24 LTS server for Azure based NVIDIA-V100 virtual machines.

The manual installation script is in this file (emacs org-mode)

(ASAP) I plan to port this manual setup in cuda-560 image build script:


A pair of notes about (rootless) Podman:

@benz0li
Copy link

benz0li commented Feb 12, 2025

Now we are standardized on CUDA-12 (nvidia-560) setup on Ubuntu 24 LTS server for Azure based NVIDIA-V100 virtual machines.

Use driver version 535 (Long Term Support Branch) with NVIDIA Data Center GPUs or select NGC-Ready NVIDIA RTX boards to ensure forward compatibility until June 2026.

@hute37
Copy link
Author

hute37 commented Feb 12, 2025

@benz0li

PyTorch installs its own CUDA binaries/libraries by default. It does not depend on the CUDA version of the image. It only depends on the NVIDIA driver version of the host.

Coming from tensorflow projects, we introduced pytorch only recently.

I noticed that pytorch poetry installation may require an alternative distribution source, depending on version:

It seems to work in a pair of active projects.

See:

@benz0li
Copy link

benz0li commented Feb 12, 2025

@hute37 Regarding TensorFlow (versions ≥ 2.18):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants