Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

hute37 · 2025-02-12T00:12:25Z

This note present an alternative approach that can be used to integrate R and Python stacks with a different container layout
In this alternative layout, python features are introduced after R stack.
This choice has the important implication of enable standard Rocker images (r-base, verse, geospatial) as a base image, avoiding R setup replication.

This discussion is a follow up of some comments in issue:

Migrate cuda based images out of rocker-versioned2

related to deprecation of CUDA enabled images:

refactor!: remove files for rocker/cuda, rocker/ml, and rocker/ml-verse

For a previous related discussion on CUDA support versioning, see also:

Supportin multiple CUDA versions? (CUDA bumps to 11.8 on the 4.2.2 images)

A brief presentation of this alternative layout in the following comments

hute37 · 2025-02-12T01:17:51Z

The solution described is implemented in this two "template" projects, used for quick setup of R and python research projects, in a heterogeneous environment as a statistics department can be.

In this context, some features are important:

full control of python/R dependency inclusion. It is often required to support inclusion of packages out of standard distributions PyPI or CRAN. Sometimes the package is in development state, on some public source repository. In other cases the packages are removed from official distribution because of lacking of maintainers
project reproducibility on different machines, at different times, with a stable set of resolved dependencies
variable support of GPU architecture. During its lifetime, a project can have different requirements and GPU availability evolve in time.
unprivileged access to computational machines. This requirement is satisfied by "rootless" execution mode, provided by Podman runtime.

The reference template projects are:

dve-sample-r - R template with python support
dve-sample-py - Python template with R support

Both project support container execution and are based on standard Rocker images

The R dependencies are managed by renv, with full dependency specification in DESCRIPTION file

The python part is not inherited by rocker containers but is managed in two stages:

"build setup phase": where the python setup is build in container image, with pyenv, pipx, poetry setup scripts
"runtime setup phase": where a python virtual environment is created and loaded by poetry, following project definition.

this is automated by this script: /docker/r-images/scripts/setup/setup_ubs-all.sh

This script, after virtualenv package installation, prepare jupyter lab environment by pulling node-js modules required by front end language support. It also register virtualenv for reticulate use and IRKernel as a jupyter kernel. renv installation is also triggered

An important note: the splitting of "setup" procedure in two different phases (build, runtime) is requested by the different nature of "storage" class visible:

"build phase" can interact only with container image, that will be immutable, after the initial build
"runtime phase" cannot change "immutable" image storage, but see a mounted writable and persistent directory where all the dynamical parts will be stored. This directory plays the role of "user home" directory and will contain python virtualenv and renv/rstudio installation path.

The detail of this volume mapping are defined in this Makefile, that holds all the podman interactions

In the next comment, a diagram of the project architecture

hute37 · 2025-02-12T01:21:32Z

project layout: drawio source

cboettig · 2025-02-12T03:15:21Z

@hute37 Very cool, thanks for sharing. It seems like you have a nice workflow for adding your cuda and python needs on top of the existing rocker/versioned2 stack, which is lovely.

In the thread from 903 you mentioned that you were looking for a solution that supported cuda 12, right? Above you mention the need to support GPU architectures, but from a quick skim it's not clear how you handle that. I did see your cuda dir with scripts of considerable complexity there already, so maybe you already have a good solution here?

Just want to make sure I'm understanding if there's a question in here somewhere or just sharing a different approach that is ready and working?

benz0li · 2025-02-12T05:32:27Z

Cross reference regarding Python installation in the Rocker images:

Using pip without venv or conda env will stop working soon rocker-versioned2#670

benz0li · 2025-02-12T05:55:47Z

With pre-built CUDA-based R + Python images one must pin versions to prevent breaking changes. For example

R v4.3 images: Pin CUDA to v11 and Python to v3.11
R v4.4 images: Pin CUDA to v12 and Python to v3.12

One also has to consider when the base image (ubuntu) is supposed to be updated to a new LTS release.

benz0li · 2025-02-12T06:13:50Z

@hute37 Regarding rocker-org/rocker-versioned2#903 (comment), i.e. new projects (PyTorch) requiring CUDA 12:

PyTorch installs its own CUDA binaries/libraries by default. It does not depend on the CUDA version of the image. It only depends on the NVIDIA driver version of the host.

Originally posted by @benz0li in iot-salzburg/gpu-jupyter#153 (comment)

See also:

GPU not work on pytorch-notebook:cuda12-python-3.11.8 image jupyter/docker-stacks#2166
[Docs] Why is the nvidia/cuda base container used for the MAX container? modular/max#273
Optimizing size of images b-data/jupyterlab-python-docker-stack#9

hute37 · 2025-02-12T10:53:06Z

@cboettig

CUDA-12 support is not ready yet. I think I'll follow the same path I followed when I had to downgrade ml-verse CUDA support in order to support our old (Azure based) NVIDIA K80 GPU, with legacy 470 nvidia driver.

In this version, the project templates are sill using CUDA-11 support from latest ml-verse container.

While this seems to work in R (keras) container tests, this is a problem for our current python environment.

Python based projects (derived from dve-sample-py) also support native ("un-contenerized") pyenv/poetry execution.

In this case, I prepare the virtual machines following a manual CUDA setup.
Now we are standardized on CUDA-12 (nvidia-560) setup on Ubuntu 24 LTS server for Azure based NVIDIA-V100 virtual machines.

The manual installation script is in this file (emacs org-mode)

310-nvidia-560.org.txt

(ASAP) I plan to port this manual setup in cuda-560 image build script:

docker/r-images/scripts/cuda/install_ubs-cuda-12-560.sh

A pair of notes about (rootless) Podman:

CUDA support in containers requires NVIDIA Container Toolkit CDI configuration. see:
- Installing the NVIDIA Container Toolkit
Visual Studio Code remote development container socket interface configuration. See:
- Remote container development with VS Code and Podman
rstudio compatibility was introduced time ago in this script: init_ubs-userconf.sh

benz0li · 2025-02-12T11:00:51Z

Now we are standardized on CUDA-12 (nvidia-560) setup on Ubuntu 24 LTS server for Azure based NVIDIA-V100 virtual machines.

Use driver version 535 (Long Term Support Branch) with NVIDIA Data Center GPUs or select NGC-Ready NVIDIA RTX boards to ensure forward compatibility until June 2026.

hute37 · 2025-02-12T11:05:17Z

@benz0li

PyTorch installs its own CUDA binaries/libraries by default. It does not depend on the CUDA version of the image. It only depends on the NVIDIA driver version of the host.

Coming from tensorflow projects, we introduced pytorch only recently.

I noticed that pytorch poetry installation may require an alternative distribution source, depending on version:

It seems to work in a pair of active projects.

See:

benz0li · 2025-02-12T11:13:00Z

@hute37 Regarding TensorFlow (versions ≥ 2.18):

Disable TensorRT in TF, XLA and JAX. tensorflow/tensorflow#68303

hute37 mentioned this issue Feb 12, 2025

Migrate cuda based images out of rocker-versioned2 rocker-org/rocker-versioned2#903

Closed

eddelbuettel changed the title ~~Alternative containers hierarcy layout for R and Python integration in CUDA enabled projects~~ Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

hute37 commented Feb 12, 2025

hute37 commented Feb 12, 2025 •

edited

Loading

hute37 commented Feb 12, 2025

cboettig commented Feb 12, 2025

benz0li commented Feb 12, 2025

benz0li commented Feb 12, 2025

benz0li commented Feb 12, 2025

hute37 commented Feb 12, 2025

benz0li commented Feb 12, 2025

hute37 commented Feb 12, 2025

benz0li commented Feb 12, 2025 •

edited

Loading

Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

Alternative containers hierarchy layout for R and Python integration in CUDA enabled projects #34

Comments

hute37 commented Feb 12, 2025

hute37 commented Feb 12, 2025 • edited Loading

hute37 commented Feb 12, 2025

cboettig commented Feb 12, 2025

benz0li commented Feb 12, 2025

benz0li commented Feb 12, 2025

benz0li commented Feb 12, 2025

hute37 commented Feb 12, 2025

benz0li commented Feb 12, 2025

hute37 commented Feb 12, 2025

benz0li commented Feb 12, 2025 • edited Loading

hute37 commented Feb 12, 2025 •

edited

Loading

benz0li commented Feb 12, 2025 •

edited

Loading