Take a configuration such as:
openhpc_nodegroups:
- name: cpu
node_params:
CPUSpecList: 92-95
Initial deployment will work fine. However, if CPUSpecList is changed (e.g. to 90-95), the deployment of the new configuration will lead affected nodes to invalid state with Reason=CoreSpec differ. This happens as soon as slurmctld is restarted, probably due to a mismatch between slurmctld and slurmd.
This can be fixed by forcing another configuration update elsewhere in the Slurm configuration.
There must be some safe way to roll out this change? Maybe stop slurmd services first, then restart slurmctld and finally start all slurmd services?