|
1 | 1 | Slurm
|
2 | 2 | =====
|
3 | 3 |
|
4 |
| -Install and configure Slurm |
| 4 | +Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers |
5 | 5 |
|
6 | 6 | Role Variables
|
7 | 7 | --------------
|
8 | 8 |
|
9 |
| -All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and create a `slurm.conf` with a single `localhost` node and `debug` partition. See the [defaults](defaults/main.yml) and [example playbook](#example-playbook) for examples. |
| 9 | +All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and |
| 10 | +create a `slurm.conf` with a single `localhost` node and `debug` partition. |
| 11 | +See the [defaults](defaults/main.yml) and [example playbooks](#example-playbooks) for examples. |
10 | 12 |
|
11 | 13 | For the various roles a slurm node can play, you can either set group names, or add values to a list, `slurm_roles`.
|
12 | 14 |
|
13 | 15 | - group slurmservers or `slurm_roles: ['controller']`
|
14 | 16 | - group slurmexechosts or `slurm_roles: ['exec']`
|
15 | 17 | - group slurmdbdservers or `slurm_roles: ['dbd']`
|
16 | 18 |
|
17 |
| -General config options for slurm.conf go in `slurm_config`, a hash. Keys are slurm config option names. |
| 19 | +General config options for slurm.conf go in `slurm_config`, a hash. Keys are Slurm config option names. |
18 | 20 |
|
19 | 21 | Partitions and nodes go in `slurm_partitions` and `slurm_nodes`, lists of hashes. The only required key in the hash is
|
20 | 22 | `name`, which becomes the `PartitionName` or `NodeName` for that line. All other keys/values are placed on to the line
|
21 | 23 | of that partition or node.
|
22 | 24 |
|
23 |
| -Set `slurm_upgrade` true to upgrade. |
| 25 | +Options for the additional configuration files [acct_gather.conf](https://slurm.schedmd.com/acct_gather.conf.html), |
| 26 | +[cgroup.conf](https://slurm.schedmd.com/cgroup.conf.html) and [gres.conf](https://slurm.schedmd.com/gres.conf.html) |
| 27 | +may be specified in the `slurm_acct_gather_config`, `slurm_cgroup_config` (both of them hashes) and |
| 28 | +`slurm_gres_config` (list of hashes) respectively. |
24 | 29 |
|
25 |
| -You can use `slurm_user` (a hash) and `slurm_create_user` (a bool) to pre-create a Slurm user (so that uids match). See |
| 30 | +Set `slurm_upgrade` to true to upgrade the installed Slurm packages. |
| 31 | + |
| 32 | +You can use `slurm_user` (a hash) and `slurm_create_user` (a bool) to pre-create a Slurm user so that uids match. |
| 33 | + |
| 34 | +Note that this role requires root access, so enable ``become`` either globally in your playbook / on the commandline or |
| 35 | +just for the role like [shown below](#example-playbooks). |
26 | 36 |
|
27 | 37 | Dependencies
|
28 | 38 | ------------
|
29 | 39 |
|
30 | 40 | None.
|
31 | 41 |
|
32 |
| -Example Playbook |
33 |
| ----------------- |
| 42 | +Example Playbooks |
| 43 | +----------------- |
| 44 | + |
| 45 | +Minimal setup, all services on one node: |
34 | 46 |
|
35 | 47 | ```yaml
|
36 | 48 | - name: Slurm all in One
|
37 | 49 | hosts: all
|
38 | 50 | vars:
|
39 | 51 | slurm_roles: ['controller', 'exec', 'dbd']
|
40 | 52 | roles:
|
41 |
| - - galaxyproject.slurm |
| 53 | + - role: galaxyproject.slurm |
| 54 | + become: True |
| 55 | +``` |
| 56 | +
|
| 57 | +More extensive example: |
| 58 | +
|
| 59 | +```yaml |
| 60 | +- name: Slurm execution hosts |
| 61 | + hosts: all |
| 62 | + roles: |
| 63 | + - role: galaxyproject.slurm |
| 64 | + become: True |
| 65 | + vars: |
| 66 | + slurm_cgroup_config: |
| 67 | + CgroupMountpoint: "/sys/fs/cgroup" |
| 68 | + CgroupAutomount: yes |
| 69 | + ConstrainCores: yes |
| 70 | + TaskAffinity: no |
| 71 | + ConstrainRAMSpace: yes |
| 72 | + ConstrainSwapSpace: no |
| 73 | + ConstrainDevices: no |
| 74 | + AllowedRamSpace: 100 |
| 75 | + AllowedSwapSpace: 0 |
| 76 | + MaxRAMPercent: 100 |
| 77 | + MaxSwapPercent: 100 |
| 78 | + MinRAMSpace: 30 |
| 79 | + slurm_config: |
| 80 | + AccountingStorageType: "accounting_storage/none" |
| 81 | + ClusterName: cluster |
| 82 | + GresTypes: gpu |
| 83 | + JobAcctGatherType: "jobacct_gather/none" |
| 84 | + MpiDefault: none |
| 85 | + ProctrackType: "proctrack/cgroup" |
| 86 | + ReturnToService: 1 |
| 87 | + SchedulerType: "sched/backfill" |
| 88 | + SelectType: "select/cons_res" |
| 89 | + SelectTypeParameters: "CR_Core" |
| 90 | + SlurmctldHost: "slurmctl" |
| 91 | + SlurmctldLogFile: "/var/log/slurm/slurmctld.log" |
| 92 | + SlurmctldPidFile: "/var/run/slurmctld.pid" |
| 93 | + SlurmdLogFile: "/var/log/slurm/slurmd.log" |
| 94 | + SlurmdPidFile: "/var/run/slurmd.pid" |
| 95 | + SlurmdSpoolDir: "/var/spool/slurmd" |
| 96 | + StateSaveLocation: "/var/spool/slurmctld" |
| 97 | + SwitchType: "switch/none" |
| 98 | + TaskPlugin: "task/affinity,task/cgroup" |
| 99 | + TaskPluginParam: Sched |
| 100 | + slurm_create_user: yes |
| 101 | + slurm_gres_config: |
| 102 | + - File: /dev/nvidia[0-3] |
| 103 | + Name: gpu |
| 104 | + NodeName: gpu[01-10] |
| 105 | + Type: tesla |
| 106 | + slurm_munge_key: "../../../munge.key" |
| 107 | + slurm_nodes: |
| 108 | + - name: "gpu[01-10]" |
| 109 | + CoresPerSocket: 18 |
| 110 | + Gres: "gpu:tesla:4" |
| 111 | + Sockets: 2 |
| 112 | + ThreadsPerCore: 2 |
| 113 | + slurm_partitions: |
| 114 | + - name: gpu |
| 115 | + Default: YES |
| 116 | + MaxTime: UNLIMITED |
| 117 | + Nodes: "gpu[01-10]" |
| 118 | + slurm_roles: ['exec'] |
| 119 | + slurm_user: |
| 120 | + comment: "Slurm Workload Manager" |
| 121 | + gid: 888 |
| 122 | + group: slurm |
| 123 | + home: "/var/lib/slurm" |
| 124 | + name: slurm |
| 125 | + shell: "/usr/sbin/nologin" |
| 126 | + uid: 888 |
42 | 127 | ```
|
43 | 128 |
|
44 | 129 | License
|
|
0 commit comments