Skip to content

Allow slurm to enforce limits on resources#107

Open
enasca wants to merge 1 commit intomainfrom
feat/slurm-mem-enforce
Open

Allow slurm to enforce limits on resources#107
enasca wants to merge 1 commit intomainfrom
feat/slurm-mem-enforce

Conversation

@enasca
Copy link
Copy Markdown
Member

@enasca enasca commented Feb 8, 2023

Can only be merged after #130

@enasca
Copy link
Copy Markdown
Member Author

enasca commented Feb 14, 2023

slurmd fails to start after this change. Investigating.

It should now work with cpu cores, memory and gpus.
@enasca enasca force-pushed the feat/slurm-mem-enforce branch from 61490ba to db035e3 Compare February 14, 2023 11:10
@enasca
Copy link
Copy Markdown
Member Author

enasca commented Feb 14, 2023

With CgroupAutomount=yes in cgroup.conf, I get

[2023-02-14T13:57:51.852] debug2: _file_read_content: unable to open '/sys/fs/cgroup/memory//release_agent' for reading : No such file or directory
[2023-02-14T13:57:51.852] debug2: xcgroup_get_param: unable to get parameter 'release_agent' for '/sys/fs/cgroup/memory/'
[2023-02-14T13:57:51.887] error: unable to mount memory cgroup namespace: Device or resource busy

in the slurmd log. Without CgroupAutomount=no, it becomes

[2023-02-14T13:59:21.441] debug2: _file_read_content: unable to open '/sys/fs/cgroup/memory//release_agent' for reading : No such file or directory
[2023-02-14T13:59:21.441] debug2: xcgroup_get_param: unable to get parameter 'release_agent' for '/sys/fs/cgroup/memory/'
[2023-02-14T13:59:21.441] error: cgroup namespace 'memory' not mounted. aborting

A lookup in the slurm repo returns no results for release_agent, so I'm inclined to think that later versions of slurm have changed this section of the code and fixed our issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant