Skip to content

Add support for autodetection of gres resources #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4137b2b
Add support for autodetection of gres resources
jovial Apr 23, 2025
464d952
Re-add whitespace
jovial Apr 23, 2025
610a8ed
Fix jinja
jovial Apr 23, 2025
b8e1400
Move to separate field as it not possible to autodetect for a subset …
jovial Apr 23, 2025
65957bb
...
jovial Apr 23, 2025
681d9a0
Apply suggestions from code review
jovial Apr 24, 2025
290744b
Update README.md
jovial Apr 24, 2025
4b45244
Support hosts in multiple partitions when templating gres.conf
jovial Apr 25, 2025
0682178
Merge remote-tracking branch 'origin/feature/gres-autodetect' into HEAD
jovial Apr 25, 2025
ec07266
Fix templating
jovial Apr 25, 2025
863bea5
..
jovial Apr 25, 2025
f793689
...
jovial Apr 25, 2025
32fbe3c
Suggestion from code review
jovial Apr 28, 2025
1c281c4
Move donehosts out of loop
jovial May 6, 2025
ba8a38a
nodegroups using nodesets - doesn't handle empty nodegroups
sjpb May 7, 2025
8f9436f
cope with empty nodegroups/partitions
sjpb May 7, 2025
0abbf76
make gres work again
sjpb May 7, 2025
b8c64dc
make node/partition parameters more greppable
sjpb May 8, 2025
6dabb2f
use features to simplify nodeset configuration
sjpb May 8, 2025
4238c70
Merge remote-tracking branch 'origin/feat/nodegroups' into HEAD
jovial May 8, 2025
ea7902a
add nodegroup.features
sjpb May 8, 2025
d16b6ba
add validation
sjpb May 8, 2025
4f3bbc8
document nodegroup.features to README
sjpb May 8, 2025
f126bba
add better examples in README
sjpb May 8, 2025
e993a54
tidy up README
sjpb May 8, 2025
e41cc84
fix validate task path
sjpb May 8, 2025
3440050
fix lint error
sjpb May 8, 2025
319ddf3
default partitions to nodegroups to make CI easier
sjpb May 8, 2025
c8e73ee
update molecule tests for openhpc_nodegroups
sjpb May 8, 2025
f5d0698
remove checks from runtime now validation defined
sjpb May 8, 2025
51001ed
Update readme
jovial May 8, 2025
a551b52
Merge remote-tracking branch 'origin/feat/nodegroups' into HEAD
jovial May 8, 2025
196716f
...
jovial May 8, 2025
9f7b19d
fix NodeName= lines missing newlines between them when multiple hostl…
sjpb May 8, 2025
02ba27c
remove tests for extra_nodes
sjpb May 8, 2025
26000f4
Update gres.conf.j2
jovial May 8, 2025
10a8ace
allow missing inventory groups (as per docs) when validating nodegroups
sjpb May 8, 2025
3c706d7
only run validation once
sjpb May 8, 2025
03dea2e
remove test14 from CI - extra_nodes feature removed
sjpb May 8, 2025
175a1c0
update complex test for new group/partition variables
sjpb May 8, 2025
2b22867
Merge remote-tracking branch 'origin/feat/nodegroups' into HEAD
jovial May 8, 2025
4930029
Merge remote-tracking branch 'origin/feature/gres-autodetect' into HEAD
jovial May 8, 2025
64e61e8
Merge remote-tracking branch 'origin/master' into HEAD
jovial May 13, 2025
eacee19
Fix README
jovial May 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PR comment:

You can only use one auto-detection mechanism per node, otherwise slurm will complain (hence why it is a per partition option and not a per gres option).

Can you explain why with it "per-gres" you end up with multiple methods per node?

Nodes can - and often are - in multiple partitions. So specifying it per-partition is not sufficent to guarantee this anyway, I think, unless there's some other subtlety in the logic.

I think what we need to support is something like this, and only like this:

openhpc_slurm_partitions:
    - name: gpu
      groups:
        - name: a100
        - name: h100
    - name: a100
      gres:
        - conf: "gpu:nvidia_a100_80gb_hbm3:2"
    - name: h100
      gres_autodetect: nvml
      gres:
        - conf: "gpu:nvidia_h100_80gb_hbm3:2"

i.e. no complicated fallbacks or overriden defaults etc. Maybe this just needs documenting, and we just let an error occur if someone does something wrong. #174 rolled up the slurm.conf NodeName= templating to allow defining nodes in multiple partitions, do we need something similar here? Or maybe not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying, essentially you can't mix methods for a particular node. I'll dig out the error message. It seems like a host var/group var would be more natural:

gres_autodetect: nvml

Outside of the openhpc_slurm_partitions definition. But will that be complicated with the host list expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like:

[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3 File=/dev/nvidia0

produces:

Apr 24 10:49:49 io-io-gpu-02.io.internal slurmd[14141]: slurmd-io-io-gpu-02: fatal: gres.conf for gpu, some records have "File" specification while others do not
Apr 24 10:49:49 io-io-gpu-02.io.internal slurmd-io-io-gpu-02[14141]: fatal: gres.conf for gpu, some records have "File" specification while others do not
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: slurmd.service: Failed with result 'exit-code'.
Apr 24 10:49:49 io-io-gpu-02.io.internal systemd[1]: Failed to start Slurm node daemon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, it does start without issue with something like this:

[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb AutoDetect=nvml
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3 AutoDetect=nvml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another clarification is that:

[rocky@io-io-gpu-02 ~]$ sudo cat /var/spool/slurm/conf-cache/gres.conf
AutoDetect=off
NodeName=io-io-gpu-[01-02] Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb AutoDetect=nvml

Will just essentially do autodetect for everything (not just the 1g.10gb instances).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I'm missing some context here! If you have autodetection, is there ever a case where you'd want to specify it manually? (i.e. can't we just say; "don't do that"?). I can imagine there's nvidia nodes where you have autodetection and other nodes where you don't, so you have to specify both, but never for the same nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, those comments were more for my reference. I was just clarifying how it behaved when specified multiple times for the same node. I think you are right when you say that we should make sure each host only appears once if autodetection is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it work as a host/group var. This means you can't set conflicting values on different partitions. Let me know what you think.

Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,10 @@ unique set of homogenous nodes:
`free --mebi` total * `openhpc_ram_multiplier`.
* `ram_multiplier`: Optional. An override for the top-level definition
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
* `gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict must define:
* `gres_autodetect`: Optional. The [auto detection mechanism](https://slurm.schedmd.com/gres.conf.html#OPT_AutoDetect) to use for the generic resources. Note: you must still define the `gres` dictionary (see below) but you only need the define the `conf` key.
* `gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict should define:
- `conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
- `file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
- `file`: Omit if `gres_autodetect` is set. A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
* `features`: Optional. List of [Features](https://slurm.schedmd.com/slurm.conf.html#OPT_Features) strings.
* `node_params`: Optional. Mapping of additional parameters and values for
Expand Down
20 changes: 14 additions & 6 deletions templates/gres.conf.j2
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
AutoDetect=off
{% for nodegroup in openhpc_nodegroups %}
{% for gres in nodegroup.gres | default([]) %}
{% set gres_name, gres_type, _ = gres.conf.split(':') %}
{% set inventory_group_name = openhpc_cluster_name ~ '_' ~ nodegroup.name %}
{% set inventory_group_hosts = groups.get(inventory_group_name, []) %}
{% set gres_list = nodegroup.gres | default([]) %}
{% set gres_autodetect = nodegroup.gres_autodetect | default('off') %}
{% set inventory_group_name = openhpc_cluster_name ~ '_' ~ nodegroup.name %}
{% set inventory_group_hosts = groups.get(inventory_group_name, []) %}
{% if gres_autodetect | default('off') != 'off' %}
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
NodeName={{ hostlist }} Name={{ gres_name }} Type={{ gres_type }} File={{ gres.file }}
NodeName={{ hostlist }} AutoDetect={{ gres_autodetect }}
{% endfor %}{# hostlists #}
{% endfor %}{# gres #}
{% else %}
{% for gres in gres_list %}
{% set gres_name, gres_type, _ = gres.conf.split(':') %}
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
NodeName={{ hostlist }} Name={{ gres_name }} Type={{ gres_type }} File={{ gres.file | mandatory('The gres configuration dictionary: ' ~ gres ~ ' is missing the file key, but gres_autodetect is set to off. The error occured on node group: ' ~ nodegroup.name ~ '. Please add the file key or set gres_autodetect.') }}
{% endfor %}{# hostlists #}
{% endfor %}{# gres #}
{% endif %}{# autodetect #}
{% endfor %}{# nodegroup #}