Skip to content
This repository was archived by the owner on Jan 17, 2024. It is now read-only.

[BUG] "Define which GPU(s) will be visible in container" setting fails #38

Open
mippos opened this issue Sep 18, 2020 · 1 comment
Open
Labels
bug Something isn't working

Comments

@mippos
Copy link

mippos commented Sep 18, 2020

Thanks for the good plugin to handle GPU resources!

There is a small bug:
"custom - Define which GPU(s) will be visible in container" setting fails to reserve multiple GPUs.
This happens at least when using pipelines.

Pipeline configuration used:

withRemoteDocker(debug: true, main:
    image(configItemList: [runtime(dockerRuntime: 'nvidia'), 
    gpus(nvidiaDevices: 'custom', nvidiaDevicesCustom: "0,1,2,3")], forcePull: false, image: 'tensorflow/tensorflow:latest-gpu', 
    volumes: []), removeContainers: false, sideContainers: [], workspaceOverride: '/ws') {
    sh("""nvidia-smi""")
}

On Docker level generated command is:
docker run -t -d --network bridge --entrypoint /bin/sh --workdir /ws -v /data/ws:/ws -v /tmp:/tmp -v /data/ws@tmp:/data/ws@tmp --gpus device=0,1,2,3 tensorflow/tensorflow:latest-gpu

Error:
docker: Error response from daemon: cannot set both Count and DeviceIDs on device request.

Root cause is probably only lack of extra quoting that is required:
NVIDIA/nvidia-docker#1026 (--gpus '"device=0,1,2,3"')

@raydouglass raydouglass added the bug Something isn't working label Sep 18, 2020
@raydouglass
Copy link
Member

Hi @mippos I experimented a lot to try and get this to work without success.

I think it might need to fallback from using --gpus to use --runtime=nvidia and setting the environment variable NVIDIA_VISIBLE_DEVICES. This is what happens for older docker versions, but it should work even for the latest docker version. I finally have a dual-GPU machine I can test with, so I'll try to find some time for testing and get this fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants