This repository was archived by the owner on Jan 17, 2024. It is now read-only.
[FEA] Queuing of GPUs and reserving selected GPUs fully #39
Labels
feature request
New feature or request
Feature request of queuing of GPUs and reserving selected GPUs fully.
We have some machine learning use cases where processing requires GPU(s) to be reserved fully. Mainly to avoid possible issues in sharing GPU RAM as it GPU RAM can't be allocated per container, NVIDIA/nvidia-docker#1297).
Currently I can e.g. set up a Jenkins node per GPU on Jenkins and handle single GPU reservation that way. We have machines with multiple GPUs. We have machine learning training use cases where one GPU is enough, but there are also cases where all GPUs on the machine are needed.
In case I run withRemoteDocker by giving e.g. one GPU per container, there is no direct queuing for Jenkins jobs that will require all GPUs of the machine reserved fully for the run.
Would be handy to have this feature implemented on remote-docker-container level that it would handle queuing single or all GPUs and not allowing any job to take gpus in use in parallel.
If there are good workarounds to implement this, please propose. My best idea at the moment is to handle this so that that I have a separate bookkeeping on machine (used GPUs info stored to some file) and to implement waiting mechanism on Jenkins pipelines.
The text was updated successfully, but these errors were encountered: