Skip to content

Only support configless mode #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: feat/b64-mungekey
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ jobs:
- test2
- test3
- test4
- test5
- test6
- test8
- test9
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ each list element:

`openhpc_slurmdbd_host`: Optional. Where to deploy slurmdbd if are using this role to deploy slurmdbd, otherwise where an existing slurmdbd is running. This should be the name of a host in your inventory. Set this to `none` to prevent the role from managing slurmdbd. Defaults to `openhpc_slurm_control_host`.

`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used.

`openhpc_munge_key_b64`: Optional. A base-64 encoded munge key. If not provided then the one generated on package install is used, but the `openhpc_slurm_control_host` must be in the play.

`openhpc_login_only_nodes`: Optional. If using "configless" mode specify the name of an ansible group containing nodes which are login-only nodes (i.e. not also control nodes), if required. These nodes will run `slurmd` to contact the control node for config.
Expand All @@ -50,6 +48,9 @@ each list element:

### slurm.conf

Note this role always operates in Slurm's [configless mode](https://slurm.schedmd.com/configless_slurm.html)
where the `slurm.conf` configuration file is only present on the control node.

`openhpc_nodegroups`: Optional, default `[]`. List of mappings, each defining a
unique set of homogenous nodes:
* `name`: Required. Name of node group.
Expand Down
1 change: 0 additions & 1 deletion defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ openhpc_default_config:

openhpc_config: {}
openhpc_gres_template: gres.conf.j2
openhpc_slurm_configless: "{{ 'enable_configless' in openhpc_config.get('SlurmctldParameters', []) }}"

openhpc_state_save_location: /var/spool/slurm

Expand Down
21 changes: 10 additions & 11 deletions molecule/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,23 @@ Test options in "Other" column flow down through table unless changed.

Test | # Partitions | Groups in partitions? | Other
--- | --- | --- | ---
test1 | 1 | N | 2x compute node, sequential names (default test), config on all nodes
test1 | 1 | N | 2x compute node, sequential names (default test)
test1b | 1 | N | 1x compute node
test1c | 1 | N | 2x compute nodes, nonsequential names
test2 | 2 | N | 4x compute node, sequential names
test3 | 1 | Y | 4x compute nodes in 2x groups, single partition
test4 | 1 | N | 2x compute node, accounting enabled
test5 | 1 | N | As for #1 but configless
test6 | 1 | N | 0x compute nodes, configless
test5 | - | - | [removed, now always configless]
test6 | 1 | N | 0x compute nodes
test7 | 1 | N | [removed, image build should just run install.yml task, this is not expected to work]
test8 | 1 | N | 2x compute node, 2x login-only nodes, configless
test8 | 1 | N | 2x compute node, 2x login-only nodes
test9 | 1 | N | As test8 but uses `--limit=testohpc-control,testohpc-compute-0` and checks login nodes still end up in slurm.conf
test10 | 1 | N | As for #5 but then tries to add an additional node
test11 | 1 | N | As for #5 but then deletes a node (actually changes the partition due to molecule/ansible limitations)
test12 | 1 | N | As for #5 but enabling job completion and testing `sacct -c`
test13 | 1 | N | As for #5 but tests `openhpc_config` variable.
test14 | 1 | N | [removed, extra_nodes removed]
test15 | 1 | Y | As for #5 but also tests `partitions with different name but with the same NodeName`.

test10 | 1 | N | As for #1 but then tries to add an additional node
test11 | 1 | N | As for #1 but then deletes a node (actually changes the partition due to molecule/ansible limitations)
test12 | 1 | N | As for #1 but enabling job completion and testing `sacct -c`
test13 | 1 | N | As for #1 but tests `openhpc_config` variable.
test14 | - | - | [removed, extra_nodes removed]
test15 | 1 | Y | As for #1 but also tests partitions with different name but with the same NodeName.

# Local Installation & Running

Expand Down
1 change: 0 additions & 1 deletion molecule/test10/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
tasks:
- name: "Include ansible-role-openhpc"
include_role:
Expand Down
1 change: 0 additions & 1 deletion molecule/test10/verify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true

- name: Check modified cluster has 3x nodes
hosts: testohpc_login
Expand Down
1 change: 0 additions & 1 deletion molecule/test11/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,3 @@
openhpc_nodegroups:
- name: "compute_orig"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
1 change: 0 additions & 1 deletion molecule/test11/verify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
openhpc_nodegroups:
- name: "compute_new"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true

- name: Check modified cluster has 1x nodes
hosts: testohpc_login
Expand Down
1 change: 0 additions & 1 deletion molecule/test12/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,4 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_slurm_job_comp_type: jobcomp/filetxt
1 change: 0 additions & 1 deletion molecule/test13/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_login_only_nodes: 'testohpc_login'
openhpc_config:
FirstJobId: 13
Expand Down
1 change: 0 additions & 1 deletion molecule/test15/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
Default: false
AllowAccounts: Group_own_thePartition
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
tasks:
- name: "Include ansible-role-openhpc"
include_role:
Expand Down
17 changes: 0 additions & 17 deletions molecule/test5/converge.yml

This file was deleted.

44 changes: 0 additions & 44 deletions molecule/test5/molecule.yml

This file was deleted.

12 changes: 0 additions & 12 deletions molecule/test5/verify.yml

This file was deleted.

1 change: 0 additions & 1 deletion molecule/test6/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
openhpc_nodegroups:
- name: "n/a"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
tasks:
- name: "Include ansible-role-openhpc"
include_role:
Expand Down
1 change: 0 additions & 1 deletion molecule/test8/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_login_only_nodes: 'testohpc_login'
tasks:
- name: "Include ansible-role-openhpc"
Expand Down
1 change: 0 additions & 1 deletion molecule/test9/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
openhpc_nodegroups:
- name: "compute"
openhpc_cluster_name: testohpc
openhpc_slurm_configless: true
openhpc_login_only_nodes: 'testohpc_login'
tasks:
- name: "Include ansible-role-openhpc"
Expand Down
4 changes: 2 additions & 2 deletions tasks/pre.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
- name: Enable batch on configless login-only nodes
- name: Enable batch on login-only nodes
# TODO: why can't we remove this by just setting openhpc_enable.batch: true for appliance login nodes??
set_fact:
openhpc_enable: "{{ openhpc_enable | combine({'batch': true}) }}"
when:
- openhpc_slurm_configless
- openhpc_login_only_nodes in group_names
9 changes: 4 additions & 5 deletions tasks/runtime.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
owner: root
group: root
mode: 0644
when: openhpc_enable.control | default(false) or not openhpc_slurm_configless | bool
when: openhpc_enable.control | default(false)
notify:
- Restart slurmctld service
register: ohpc_slurm_conf
Expand All @@ -76,7 +76,7 @@
mode: "0600"
owner: slurm
group: slurm
when: openhpc_enable.control | default(false) or not openhpc_slurm_configless | bool
when: openhpc_enable.control | default(false)
notify:
- Restart slurmctld service
register: ohpc_gres_conf
Expand All @@ -90,7 +90,7 @@
mode: "0644" # perms/ownership based off src from ohpc package
owner: root
group: root
when: openhpc_enable.control | default(false) or not openhpc_slurm_configless | bool
when: openhpc_enable.control | default(false)

- name: Remove local tempfile for slurm.conf templating
ansible.builtin.file:
Expand Down Expand Up @@ -132,10 +132,9 @@
- name: Configure slurmd command line options
vars:
slurmd_options_configless: "--conf-server {{ openhpc_slurm_control_host_address | default(openhpc_slurm_control_host) }}"
slurmd_options: ""
lineinfile:
path: /etc/sysconfig/slurmd
line: "SLURMD_OPTIONS='{{ slurmd_options_configless if openhpc_slurm_configless | bool else slurmd_options }}'"
line: "SLURMD_OPTIONS='{{ slurmd_options_configless }}'"
regexp: "^SLURMD_OPTIONS="
create: yes
owner: root
Expand Down
6 changes: 2 additions & 4 deletions templates/slurm.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,14 @@ ClusterName={{ openhpc_cluster_name }}
# PARAMETERS
{% for k, v in openhpc_default_config | combine(openhpc_config) | items %}
{% if v != "omit" %}{# allow removing items using setting key: null #}
{% if k != 'SlurmctldParameters' %}{# handled separately due to openhpc_slurm_configless #}
{% if k != 'SlurmctldParameters' %}{# handled separately due to configless mode #}
{{ k }}={{ v | join(',') if (v is sequence and v is not string) else v }}
{% endif %}
{% endif %}
{% endfor %}

{% set slurmctldparameters = ((openhpc_config.get('SlurmctldParameters', []) + (['enable_configless'] if openhpc_slurm_configless | bool else [])) | unique) %}
{% if slurmctldparameters | length > 0 %}
{% set slurmctldparameters = ((openhpc_config.get('SlurmctldParameters', []) + ['enable_configless']) | unique) %}
SlurmctldParameters={{ slurmctldparameters | join(',') }}
{% endif %}

# LOGIN-ONLY NODES
# Define slurmd nodes not in partitions for login-only nodes in "configless" mode:
Expand Down