Skip to content

Separate node and partition configuration #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented May 7, 2025

Replaces openhpc_slurm_partitions with openhpc_nodegroups and openhpc_partitions, to make complex partition configurations correct. Previously it would have been possible to to defined nodes twice, with different parameters, and only the first occurance would have been used.

As part of these changes this:

  • Adds nodegroup.node_params to allow specifying additional Node-level parameters
  • Adds nodegroup.features to allow specifying additional Node features
  • Removes openhpc_slurm_partitions.cluster_name
  • Removes openhpc_slurm_partitions.extra_nodes

@sjpb sjpb requested a review from a team as a code owner May 7, 2025 13:17
@sjpb
Copy link
Collaborator Author

sjpb commented May 7, 2025

In the slurm appliance, it needs the default openhpc config and the tf-templated one to be something like:

openhpc_nodegroups:
  - name: compute
openhpc_partitions:
  - name: compute

but, maybe, the latter should actually be in common env as openhpc_partitions: "{{ openhpc_nodegroups }}" to give a partition per nodegroup by default, with no further config.

@jovial
Copy link
Contributor

jovial commented May 7, 2025

Does it resolve the issues with autoconfiguration?

This is definitely workable and seems better than describing properties of nodes on the partition (which could be configured inconsistently when nodes were members of multiple partitions.

Should openhpc_nodegroups actually be called openhpc_nodesets

+1 for consistency

Do we want to extend this with an alternative way of overriding the "top-level" parameters, to make all this breaking change visible in one PR (or, we just fixup release notes after - that's probably better).`

Could you give an example? Just trying to figure out exactly what you mean here.

Do we want to add a validate task which checks no nodegroups overlap hosts? Shouldn't be too hard, I think.

Always good to fail early

@sjpb
Copy link
Collaborator Author

sjpb commented May 8, 2025

Actually @jovial if we're using Features/NodeSets internally we can't let people do openhpc_nodegroups[*].params.Features b/c that'll break our templating so I will have to add an openhpc_nodegroups[*].features which is additive to the nodegroup features.

edit: now done

@sjpb sjpb force-pushed the feat/nodegroups branch from f00d351 to 8d0b617 Compare May 8, 2025 12:56
@sjpb sjpb force-pushed the feat/nodegroups branch from 8d0b617 to 6dabb2f Compare May 8, 2025 13:01
@sjpb sjpb requested a review from jovial May 8, 2025 16:42
@priteau priteau self-requested a review May 9, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants