Skip to content

Commit

Permalink
Merge pull request galaxyproject#19 from galaxyproject/community-upda…
Browse files Browse the repository at this point in the history
…tes-bugfixes

Numerous features and bugfixes from #1
  • Loading branch information
natefoo authored Nov 11, 2021
2 parents 0d347ce + 20b685d commit 69b9ede
Show file tree
Hide file tree
Showing 8 changed files with 162 additions and 28 deletions.
101 changes: 93 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,129 @@
Slurm
=====

Install and configure Slurm
Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers

Role Variables
--------------

All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and create a `slurm.conf` with a single `localhost` node and `debug` partition. See the [defaults](defaults/main.yml) and [example playbook](#example-playbook) for examples.
All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and
create a `slurm.conf` with a single `localhost` node and `debug` partition.
See the [defaults](defaults/main.yml) and [example playbooks](#example-playbooks) for examples.

For the various roles a slurm node can play, you can either set group names, or add values to a list, `slurm_roles`.

- group slurmservers or `slurm_roles: ['controller']`
- group slurmexechosts or `slurm_roles: ['exec']`
- group slurmdbdservers or `slurm_roles: ['dbd']`

General config options for slurm.conf go in `slurm_config`, a hash. Keys are slurm config option names.
General config options for slurm.conf go in `slurm_config`, a hash. Keys are Slurm config option names.

Partitions and nodes go in `slurm_partitions` and `slurm_nodes`, lists of hashes. The only required key in the hash is
`name`, which becomes the `PartitionName` or `NodeName` for that line. All other keys/values are placed on to the line
of that partition or node.

Set `slurm_upgrade` true to upgrade.
Options for the additional configuration files [acct_gather.conf](https://slurm.schedmd.com/acct_gather.conf.html),
[cgroup.conf](https://slurm.schedmd.com/cgroup.conf.html) and [gres.conf](https://slurm.schedmd.com/gres.conf.html)
may be specified in the `slurm_acct_gather_config`, `slurm_cgroup_config` (both of them hashes) and
`slurm_gres_config` (list of hashes) respectively.

You can use `slurm_user` (a hash) and `slurm_create_user` (a bool) to pre-create a Slurm user (so that uids match). See
Set `slurm_upgrade` to true to upgrade the installed Slurm packages.

You can use `slurm_user` (a hash) and `slurm_create_user` (a bool) to pre-create a Slurm user so that uids match.

Note that this role requires root access, so enable ``become`` either globally in your playbook / on the commandline or
just for the role like [shown below](#example-playbooks).

Dependencies
------------

None.

Example Playbook
----------------
Example Playbooks
-----------------

Minimal setup, all services on one node:

```yaml
- name: Slurm all in One
hosts: all
vars:
slurm_roles: ['controller', 'exec', 'dbd']
roles:
- galaxyproject.slurm
- role: galaxyproject.slurm
become: True
```
More extensive example:
```yaml
- name: Slurm execution hosts
hosts: all
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_cgroup_config:
CgroupMountpoint: "/sys/fs/cgroup"
CgroupAutomount: yes
ConstrainCores: yes
TaskAffinity: no
ConstrainRAMSpace: yes
ConstrainSwapSpace: no
ConstrainDevices: no
AllowedRamSpace: 100
AllowedSwapSpace: 0
MaxRAMPercent: 100
MaxSwapPercent: 100
MinRAMSpace: 30
slurm_config:
AccountingStorageType: "accounting_storage/none"
ClusterName: cluster
GresTypes: gpu
JobAcctGatherType: "jobacct_gather/none"
MpiDefault: none
ProctrackType: "proctrack/cgroup"
ReturnToService: 1
SchedulerType: "sched/backfill"
SelectType: "select/cons_res"
SelectTypeParameters: "CR_Core"
SlurmctldHost: "slurmctl"
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmctldPidFile: "/var/run/slurmctld.pid"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdPidFile: "/var/run/slurmd.pid"
SlurmdSpoolDir: "/var/spool/slurmd"
StateSaveLocation: "/var/spool/slurmctld"
SwitchType: "switch/none"
TaskPlugin: "task/affinity,task/cgroup"
TaskPluginParam: Sched
slurm_create_user: yes
slurm_gres_config:
- File: /dev/nvidia[0-3]
Name: gpu
NodeName: gpu[01-10]
Type: tesla
slurm_munge_key: "../../../munge.key"
slurm_nodes:
- name: "gpu[01-10]"
CoresPerSocket: 18
Gres: "gpu:tesla:4"
Sockets: 2
ThreadsPerCore: 2
slurm_partitions:
- name: gpu
Default: YES
MaxTime: UNLIMITED
Nodes: "gpu[01-10]"
slurm_roles: ['exec']
slurm_user:
comment: "Slurm Workload Manager"
gid: 888
group: slurm
home: "/var/lib/slurm"
name: slurm
shell: "/usr/sbin/nologin"
uid: 888
```
License
Expand Down
4 changes: 4 additions & 0 deletions handlers/main.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
---
- name: restart munge
service:
name: munge
state: restarted

- name: reload slurmd
service:
Expand Down
23 changes: 23 additions & 0 deletions tasks/_inc_extra_configs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---

- name: Install extra execution host configs
template:
src: "{{ item.template }}"
dest: "{{ slurm_config_dir }}/{{ item.name }}"
backup: yes
with_items:
- name: acct_gather.conf
config: slurm_acct_gather_config
template: generic.conf.j2
- name: cgroup.conf
config: slurm_cgroup_config
template: generic.conf.j2
- name: gres.conf
config: slurm_gres_config
template: gres.conf.j2
loop_control:
label: "{{ item.name }}"
when: item.config in vars
notify:
- reload slurmctld
- reload slurmd
2 changes: 2 additions & 0 deletions tasks/munge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
group: munge
mode: 0400
when: slurm_munge_key is defined
notify:
- restart munge

- name: Ensure Munge is enabled and running
service:
Expand Down
7 changes: 7 additions & 0 deletions tasks/slurmctld.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,10 @@
mode: 0755
state: directory
when: slurm_create_dirs and __slurm_config_merged.SlurmctldLogFile != omit

- name: Include config dir creation tasks
include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs

- name: Include extra config creation tasks
include_tasks: _inc_extra_configs.yml
11 changes: 2 additions & 9 deletions tasks/slurmd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,5 @@
include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs

- name: Install extra execution host configs
template:
src: generic.conf.j2
dest: "{{ slurm_config_dir }}/{{ item.name }}"
backup: yes
with_items:
- name: cgroup.conf
config: slurm_cgroup_config
when: item.config in vars
- name: Include extra config creation tasks
include_tasks: _inc_extra_configs.yml
31 changes: 20 additions & 11 deletions tasks/slurmdbd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,30 @@
name: "{{ __slurm_packages.slurmdbd }}"
state: "{{ 'latest' if slurm_upgrade else 'present' }}"

- name: Install slurmdbd.conf
template:
src: generic.conf.j2
dest: "{{ slurm_config_dir }}/slurmdbd.conf"
owner: "{{ __slurm_user_name }}"
group: root
mode: 0400
notify:
- reload slurmdbd

- name: Create slurm log directory
file:
path: "{{ __slurmdbd_config_merged.LogFile | dirname }}"
owner: "{{ __slurm_user_name }}"
group: "{{ __slurm_group_name }}"
mode: 0755
state: directory
when: slurm_create_dirs and __slurmdbd_config_merged.LogFile
when: slurm_create_dirs and __slurmdbd_config_merged.LogFile != omit

- name: Include config dir creation tasks
include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs

- name: Install slurmdbd.conf
template:
src: generic.conf.j2
dest: "{{ slurm_config_dir }}/{{ item.name }}"
owner: "{{ __slurm_user_name }}"
group: root
mode: 0600
with_items:
- name: slurmdbd.conf
config: __slurmdbd_config_merged
loop_control:
label: "{{ item.name }}"
notify:
- reload slurmdbd
11 changes: 11 additions & 0 deletions templates/gres.conf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
##
## This file is maintained by Ansible - ALL MODIFICATIONS WILL BE REVERTED
##

{% set conf = lookup('vars', item.config) %}
{% for gres in conf %}
{% if gres['NodeName'] is not none %}
NodeName={{ gres['NodeName'] }}{% for key in gres | sort %}{% if key != 'NodeName' %} {{ key }}={{ gres[key] }}{% endif %}{% endfor %}

{% endif %}
{% endfor %}

0 comments on commit 69b9ede

Please sign in to comment.