-
Notifications
You must be signed in to change notification settings - Fork 19
Add support for autodetection of gres resources #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/nodegroups
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the PR comment:
You can only use one auto-detection mechanism per node, otherwise slurm will complain (hence why it is a per partition option and not a per gres option).
Can you explain why with it "per-gres" you end up with multiple methods per node?
Nodes can - and often are - in multiple partitions. So specifying it per-partition is not sufficent to guarantee this anyway, I think, unless there's some other subtlety in the logic.
I think what we need to support is something like this, and only like this:
openhpc_slurm_partitions:
- name: gpu
groups:
- name: a100
- name: h100
- name: a100
gres:
- conf: "gpu:nvidia_a100_80gb_hbm3:2"
- name: h100
gres_autodetect: nvml
gres:
- conf: "gpu:nvidia_h100_80gb_hbm3:2"
i.e. no complicated fallbacks or overriden defaults etc. Maybe this just needs documenting, and we just let an error occur if someone does something wrong. #174 rolled up the slurm.conf NodeName= templating to allow defining nodes in multiple partitions, do we need something similar here? Or maybe not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying, essentially you can't mix methods for a particular node. I'll dig out the error message. It seems like a host var/group var would be more natural:
gres_autodetect: nvml
Outside of the openhpc_slurm_partitions
definition. But will that be complicated with the host list expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like:
[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3 File=/dev/nvidia0
produces:
Apr 24 10:49:49 io-io-gpu-02.io.internal slurmd[14141]: slurmd-io-io-gpu-02: fatal: gres.conf for gpu, some records have "File" specification while others do not
Apr 24 10:49:49 io-io-gpu-02.io.internal slurmd-io-io-gpu-02[14141]: fatal: gres.conf for gpu, some records have "File" specification while others do not
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: slurmd.service: Failed with result 'exit-code'.
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: Failed to start Slurm node daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, it does start without issue with something like this:
[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3 AutoDetect=nvml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another clarification is that:
[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml
Will just essentially do autodetect for everything (not just the 1g.10gb
instances).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think I'm missing some context here! If you have autodetection, is there ever a case where you'd want to specify it manually? (i.e. can't we just say; "don't do that"?). I can imagine there's nvidia nodes where you have autodetection and other nodes where you don't, so you have to specify both, but never for the same nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, those comments were more for my reference. I was just clarifying how it behaved when specified multiple times for the same node. I think you are right when you say that we should make sure each host only appears once if autodetection is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it work as a host/group var. This means you can't set conflicting values on different partitions. Let me know what you think.
README.md
Outdated
- `conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string. | ||
- `file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example. | ||
- `file`: Omit if `gres_autodetect` is set, A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `file`: Omit if `gres_autodetect` is set, A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example. | |
- `file`: Omit if `gres_autodetect` is set. A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example. |
or move the addition to the end of the item 🤷 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've left it at the beginning, as I felt it was the most important bit of information.
Ready for review but merge #183 first (this PR targets that branch to avoid noise in diff) |
Adds support for setting the AutoDetection property on gres resources. This prevents the need to manually specify File in the gres dictionary. You can only use one auto-detection mechanism per node, otherwise slurm will complain (hence why it is a per partition option and not a per gres option).
Example: