Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image for tidymodels, tidyverse, tensorflow, keras with GPU passthrough #561

Closed
fahadshery opened this issue Nov 4, 2022 · 24 comments
Closed
Labels

Comments

@fahadshery
Copy link

Container image name

No response

Container image digest

No response

What operating system related to this question?

Linux

System information

No response

Question

Hi,

I am looking to install RStudio in docker using portainer. What's the best image to install that bundles the following packages:

  1. Tidyverse
  2. Tidymodels
  3. Tensorflow (GPU enabled)
  4. Keras
  5. DBI (for postgers)

Thanks

@cboettig
Copy link
Member

cboettig commented Nov 4, 2022

Thanks for your question! The best way to install these packages with GPU support is to use the rocker/ml series of images.

Because python projects typically version their python modules at the project level, often requiring different and sometimes conflicting versions, we do not pre-install a globally shared version of these. Instead, these images provide the CUDA environment for GPU use with these packages, which tends to be the more finicky aspect to set up. You can easily install any of the R packages you mention in the usual way, and use their built-in installers to select your chosen version of python modules.

Some further documentation is here: https://rocker-project.org/images/versioned/cuda.html . Please let us know if you have other questions, this helps us flush out our documentation in this relatively new area.

@fahadshery
Copy link
Author

Hi,

thanks. I am gone give rocker/ml a try. I wish to install few libraries such as:

tensorflow
keras
forecast
tidymodels

what's the best way to do it?

@cboettig
Copy link
Member

The base R method, install.packages() should be fine (or most other standard methods, like install_github(), should also be fine! Note that for many of these that wrap python modules, you will want to then use their own built-in method, e.g.

library(tensorflow)
install_tensorflow()

as shown in https://github.com/rstudio/tensorflow#tensorflow-for-r .

@fahadshery
Copy link
Author

Hi,

in case anyone else comes knocking, here is the docker file I used to re-create/extend the image:

FROM rocker/ml:latest

# important for Cairo package installation
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
    libxt-dev \
    && rm -rf /var/lib/apt/lists/*

RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3 \
    tidymodels \
    forecast \
    tensorflow \
    keras \
    tidyquant \
    xgboost \
    randomForest \
    data.table \
    plotly \
    mlr3 \
    caret \
    ggraph \
    e1071 \
    xts \
    prophet \
    tidytext \
    lubridate

# Clean up
RUN rm -rf /tmp/downloaded_packages
RUN rm -rf /var/lib/apt/lists/*

## Strip binary installed lybraries from RSPM
## https://github.com/rocker-org/rocker-versioned2/issues/340
RUN strip /usr/local/lib/R/site-library/*/libs/*.so

@fahadshery
Copy link
Author

The only problem I still have is that tensorflow can't see my GPU even though nvidia-smi tool is available within the shell of the container.

I am also running jupyter/datascience-notebook and my GPU is easily accessible there within tensorflow etc.

@benz0li
Copy link
Contributor

benz0li commented Nov 17, 2022

I am also running jupyter/datascience-notebook and my GPU is easily accessible there within tensorflow etc.

AFAIK jupyter/datascience-notebook (core stack) is not GPU enabled.

@fahadshery Are you talking about one of the GPU enabled notebooks?

@cboettig
Copy link
Member

@fahadshery you may need to make sure that the version of CUDA drivers on the host machine is at least as new as the version of the CUDA libraries on the container. When tensorflow tries and fails to access the GPU it should print out a pretty detailed log. GPU interactions are complex, we really need a lot more detail to understand what you are seeing.

Can you show us the output of nvidia-smi from within the container bash shell? Can you also show us the output of tensorflow::tf_gpu_configured() from the R console?

@markk-fmi
Copy link

Hello @cboettig - Thanks for your kind advice here. I am trying to make my way to a similar destination: a container that runs R-keras ML analysis. One topic that I seem to be bumping into that I haven't found documented (sorry if I missed it) is how to deal with meeting the python dependency exactly. You mention above that the rocker/ml image doesn't try to do that for good reasons. But to actually make this work we need to satisfy that dependency and it seems that the R-side wants the python on a particular path.

This stackoverflow python-in-r-error-could-not-find-a-python-environment-for-usr-bin-python provides a solution that seems to work, but it involves taking these steps in the running container every time:

  • run install_tensorflow()
  • accept the default prompt to install miniconda

It is awkward to ask each user to take these steps every time they want to run the container. Is there a means to accomplish these steps as part of the Dockerfile so that the image environments are aligned and so that the users don't have to further configure the image?

Thanks for any advice you might provide. mark

@fahadshery
Copy link
Author

@fahadshery Are you talking about one of the GPU enabled notebooks?

Yes.
rocker/ml:latest to be precised!

@markk-fmi
Copy link

Hi @fahadshery - I patterned my Dockerfile off of yours. Here it is:

FROM rocker/ml:4.1.1

# important for Cairo package installation
# RUN apt-get update \
#     && apt-get install -y --no-install-recommends \
#     libxt-dev \
#     && rm -rf /var/lib/apt/lists/*

RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3 \
    tensorflow \
    keras \
    mlr3

# Clean up
RUN rm -rf /tmp/downloaded_packages
RUN rm -rf /var/lib/apt/lists/*

## Strip binary installed lybraries from RSPM
## https://github.com/rocker-org/rocker-versioned2/issues/340
RUN strip /usr/local/lib/R/site-library/*/libs/*.so

COPY ./code /home/rstudio/code

So yes, I'm starting with the same base image as you were, but I'm specifying an R-version.

@cboettig
Copy link
Member

Hi @markk-fmi , good question, it depends what you mean by 'every time'. There are at least two quite different use patterns here: users who deploy a persistent container, and users who start up a machine in a freshly pulled container each session.

From the description it sounds like you are referring to the latter pattern. In that case, I think the simplest thing to do is define your own derivative dockerfile that performs these two steps. (It can also handle the installation of any particular packages you like to use). Does that make sense? e.g. why not just add

RUN Rscript -e "tensorflow::install_tensorflow()"

to your Dockerfile?

I agree it would be nice to have an 'out-of-the-box' solution, but then it is difficult to for users of a persistent container that may want different python environments configured for different projects. Perhaps we can introduce a second user that has a pre-configured user-wide python env....

@fahadshery
Copy link
Author

fahadshery commented Nov 23, 2022

Can you show us the output of nvidia-smi from within the container bash shell?

root@2119f7c7df7f:/# nvidia-smi
Wed Nov 23 22:19:53 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:00:10.0 Off |                  N/A |
|  0%   27C    P0    15W / 120W |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Can you also show us the output of tensorflow::tf_gpu_configured() from the R console?

> library(tensorflow)
> tf$config$list_physical_devices()
2022-11-23 22:21:22.976474: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW
2022-11-23 22:21:22.976797: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 2119f7c7df7f
2022-11-23 22:21:22.976897: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 2119f7c7df7f
2022-11-23 22:21:22.979510: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 520.61.5
2022-11-23 22:21:22.979766: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.76.0
2022-11-23 22:21:22.979825: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.76.0 does not match DSO version 520.61.5 -- cannot find working devices in this configuration
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
> tensorflow::tf_gpu_configured()
TensorFlow built with CUDA:  TRUE 
GPU device name:  [1] FALSE

However, if I run the same using the tensorflow/tensorflow:latest-gpu-jupyter image, I get the following:

Screenshot 2022-11-23 at 22 29 46

@markk-fmi
Copy link

Thanks @cboettig . Perhaps I should have just tried what you suggest. I assumed that the interactive prompt to install miniconda (user needs to type 'Y') would be a problem and prevent the miniconda installation. Maybe that's not true? I looked in the code for install_tensorflow.sh and didn't see anything to default the miniconda installation to proceed. I'll try what you suggest.

@markk-fmi
Copy link

Hi @fahadshery - the machine I'm currently testing on does not have any GPUs.

@markk-fmi
Copy link

@cboettig - Adding the step you suggested seems to run into the issue of the missing python environment:

Sending build context to Docker daemon  5.12 kB
Step 1/7 : FROM rocker/ml:4.1.1
 ---> 8d4f16920869
Step 2/7 : RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3     tensorflow     keras     mlr3
 ---> Using cache
 ---> 53a3ae667b7d
Step 3/7 : RUN Rscript -e "tensorflow::install_tensorflow()"
 ---> Running in 195b19a94641
Error: could not find a Python environment for /usr/bin/python3
Execution halted
The command '/bin/sh -c Rscript -e "tensorflow::install_tensorflow()"' returned a non-zero code: 1
(base)

@markk-fmi
Copy link

markk-fmi commented Nov 23, 2022

@cboettig - it looks like it is all very close ... here's the python situation inside the container:

rstudio@8d7272530bed:~$ which python
/usr/local/bin/python
rstudio@8d7272530bed:~$ python
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
rstudio@8d7272530bed:~$ ls /usr/bin | grep python
dh_python2
python2
python2.7
python3
python3.8
python3.8-config
python3-config
x86_64-linux-gnu-python3.8-config
x86_64-linux-gnu-python3-config
rstudio@8d7272530bed:~$ ls -al /usr/bin | grep python
-rwxr-xr-x   7 root root      1057 Mar 13  2020 dh_python2
lrwxrwxrwx   1 root root        23 Jul  1 12:27 pdb2.7 -> ../lib/python2.7/pdb.py
lrwxrwxrwx   1 root root        23 Jun 22 20:18 pdb3.8 -> ../lib/python3.8/pdb.py
lrwxrwxrwx   1 root root        31 Mar 13  2020 py3versions -> ../share/python3/py3versions.py
lrwxrwxrwx   1 root root         9 Mar 13  2020 python2 -> python2.7
-rwxr-xr-x   7 root root   3662032 Jul  1 12:27 python2.7
lrwxrwxrwx   1 root root         9 Mar 13  2020 python3 -> python3.8
-rwxr-xr-x  23 root root   5502744 Jun 22 20:18 python3.8
lrwxrwxrwx   1 root root        33 Jun 22 20:18 python3.8-config -> x86_64-linux-gnu-python3.8-config
lrwxrwxrwx   1 root root        16 Mar 13  2020 python3-config -> python3.8-config
lrwxrwxrwx   1 root root        29 Mar 13  2020 pyversions -> ../share/python/pyversions.py
-rwxr-xr-x   8 root root      3241 Jun 22 20:18 x86_64-linux-gnu-python3.8-config
lrwxrwxrwx   1 root root        33 Mar 13  2020 x86_64-linux-gnu-python3-config -> x86_64-linux-gnu-python3.8-config
rstudio@8d7272530bed:~$ ls -al /usr/local/bin/ | grep python
lrwxrwxrwx  1 root root    16 Oct 31 13:15 python -> /usr/bin/python3
rstudio@8d7272530bed:~$ 

Do you understand the install_tensorflow() error about Error: could not find a Python environment for /usr/bin/python3?

There is a python at that path, right?

@cboettig
Copy link
Member

yeah that's weird -- seems to work fine on rocker/ml:4.2.2 (latest), and also 4.2.1. Is there a reason you need 4.1.1 specifically?

@eitsupi any idea why this would happen on 4.1.1 ? maybe the older version of reticulate pinned in that version simply did not have as good a system for locating python?

@fahadshery
Copy link
Author

any idea as to why I am unable to access my GPU when using rocker/ml:latest as a base image?
thanks

@eitsupi
Copy link
Member

eitsupi commented Nov 24, 2022

@eitsupi any idea why this would happen on 4.1.1 ? maybe the older version of reticulate pinned in that version simply did not have as good a system for locating python?

I don't know because I don't use those images or packages, but you may need to check to see if the image was built after #494.

@markk-fmi
Copy link

We were working with 4.1.1 simply because that is a version we have other applications in. It is not a hard requirement. I will try 4.2.2 and let you know what happens.

@markk-fmi
Copy link

markk-fmi commented Nov 29, 2022

Hello @cboettig - Thank you for your suggestion. I've tried 4.2.2 and 4.2.1 as you mentioned and they don't seem to work in my hands.

4.2.2 seems to fail to build with the RUN install2.r step even without the install_tensorflow() step:

(base) more ./rocker_ml_plus-tensorflow_image/Dockerfile
FROM rocker/ml:4.2.2

# important for Cairo package installation
# RUN apt-get update \
#     && apt-get install -y --no-install-recommends \
#     libxt-dev \
#     && rm -rf /var/lib/apt/lists/*

RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3 \
    tensorflow \
    keras \
    mlr3

# install tensorflow python depedency and miniconda python
# RUN Rscript -e "tensorflow::install_tensorflow()"

# Clean up
RUN rm -rf /tmp/downloaded_packages
RUN rm -rf /var/lib/apt/lists/*

## Strip binary installed lybraries from RSPM
## https://github.com/rocker-org/rocker-versioned2/issues/340
RUN strip /usr/local/lib/R/site-library/*/libs/*.so

COPY ./code /home/rstudio/code
(base) docker build -t XXXX/rocker_ml_plus-tensorflow ./rocker_ml_plus-tensorflow_image
Sending build context to Docker daemon  5.12 kB
Step 1/6 : FROM rocker/ml:4.2.2
 ---> 1813427ba242
Step 2/6 : RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3     tensorflow     keras     mlr3
 ---> Running in f347babe2ff5
Error: .onLoad failed in loadNamespace() for 'utils', details:
  call: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)
  error: cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'
Error: .onLoad failed in loadNamespace() for 'utils', details:
  call: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)
  error: cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'
The command '/bin/sh -c install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3     tensorflow     keras     mlr3' returned a non-zero code: 1
(base) 

4.2.1 builds with the install_tensorflow() step, but still doesn't find the correct python in the container:

(base) more rocker_ml_plus-tensorflow_image/Dockerfile
FROM rocker/ml:4.2.1

# important for Cairo package installation
# RUN apt-get update \
#     && apt-get install -y --no-install-recommends \
#     libxt-dev \
#     && rm -rf /var/lib/apt/lists/*

RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3 \
    tensorflow \
    keras \
    mlr3

# install tensorflow python depedency and miniconda python
RUN Rscript -e "tensorflow::install_tensorflow()"

# Clean up
RUN rm -rf /tmp/downloaded_packages
RUN rm -rf /var/lib/apt/lists/*

## Strip binary installed lybraries from RSPM
## https://github.com/rocker-org/rocker-versioned2/issues/340
RUN strip /usr/local/lib/R/site-library/*/libs/*.so

COPY ./code /home/rstudio/code
(base)

But in Rstudio in the running container we still fail to find the python needed:

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(tensorflow)
> library(keras)
> model <- keras_model_sequential(name = "my_sequential")
No non-system installation of Python could be found.
Would you like to download and install Miniconda?
Miniconda is an open source environment management system for Python.
See https://docs.conda.io/en/latest/miniconda.html for more details.

Would you like to install Miniconda? [Y/n]: 

Does build and execution of these steps give a different result in your hands? Thanks!

@markk-fmi
Copy link

Hi @cboettig - with 4.2.1 the key hint in the error message above is "No non-system installation of Python". This is because the Dockerfile step RUN Rscript -e "tensorflow::install_tensorflow()" was running as user root, not rstudio. I changed that in the Dockerfile and the build-time installation of python now works. Here's the updated Dockerfile:

(base) more rocker_ml_plus-tensorflow_image/Dockerfile
FROM rocker/ml:4.2.1

# important for Cairo package installation
# RUN apt-get update \
#     && apt-get install -y --no-install-recommends \
#     libxt-dev \
#     && rm -rf /var/lib/apt/lists/*

RUN install2.r --error --skipmissing --deps TRUE --skipinstalled -n 3 \
    tensorflow \
    keras \
    mlr3

# install tensorflow python depedency and miniconda python
USER rstudio
RUN Rscript -e "tensorflow::install_tensorflow()"
USER root

# Clean up
RUN rm -rf /tmp/downloaded_packages
RUN rm -rf /var/lib/apt/lists/*

## Strip binary installed lybraries from RSPM
## https://github.com/rocker-org/rocker-versioned2/issues/340
RUN strip /usr/local/lib/R/site-library/*/libs/*.so

COPY ./code /home/rstudio/code
(base)

which works fine in the container Rstudio WITHOUT installing miniconda python in the container.

@eitsupi
Copy link
Member

eitsupi commented Feb 8, 2025

We are deciding to stop supporting anything like rocker/ml here any more (#903), so I am closing this.
Thank you for your understanding.

@eitsupi eitsupi closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2025
@cboettig
Copy link
Member

Just for clarity -- by "here" I believe @eitsupi means "rocker-versioned2" and not "rocker-org" -- as per #903 thread we have a new setup in rocker/ml seeking to support these use cases. The discussion in #903 also mentions several other approaches outside the rocker project that might also be suitable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants