Skip to content
This repository was archived by the owner on Jan 17, 2024. It is now read-only.

[FEA] Queuing of GPUs and reserving selected GPUs fully #39

Open
mippos opened this issue Sep 18, 2020 · 2 comments
Open

[FEA] Queuing of GPUs and reserving selected GPUs fully #39

mippos opened this issue Sep 18, 2020 · 2 comments
Labels
feature request New feature or request

Comments

@mippos
Copy link

mippos commented Sep 18, 2020

Feature request of queuing of GPUs and reserving selected GPUs fully.

We have some machine learning use cases where processing requires GPU(s) to be reserved fully. Mainly to avoid possible issues in sharing GPU RAM as it GPU RAM can't be allocated per container, NVIDIA/nvidia-docker#1297).

Currently I can e.g. set up a Jenkins node per GPU on Jenkins and handle single GPU reservation that way. We have machines with multiple GPUs. We have machine learning training use cases where one GPU is enough, but there are also cases where all GPUs on the machine are needed.

In case I run withRemoteDocker by giving e.g. one GPU per container, there is no direct queuing for Jenkins jobs that will require all GPUs of the machine reserved fully for the run.

Would be handy to have this feature implemented on remote-docker-container level that it would handle queuing single or all GPUs and not allowing any job to take gpus in use in parallel.

If there are good workarounds to implement this, please propose. My best idea at the moment is to handle this so that that I have a separate bookkeeping on machine (used GPUs info stored to some file) and to implement waiting mechanism on Jenkins pipelines.

@raydouglass raydouglass added the feature request New feature or request label Sep 18, 2020
@raydouglass
Copy link
Member

This is a pretty difficult problem given the limitations of Jenkins. I think maybe there's something in lockable resources that could be useful though.

@mippos
Copy link
Author

mippos commented Aug 30, 2021

To workaround the problem of multi GPU reservation (single or all GPUs of the machine), I wrote couple pipelines to handle that. These are not that elegant, but works.

Pipeline 1:
run_on_single_or_multigpu_groovy.txt

Pipeline 2:
gpu_reserver_for_multi_gpu_groovy.txt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants