ggml-cpu: Respect cpumask settings with OpenMP #16164

wishstudio · 2025-09-22T03:09:15Z

It is useful to set CPU affinity on CPUs with heterogeneous cores to force using performance cores, in order to eliminate CPU stalling caused by speed mismatch between different types of cores.

In llama.cpp there are a series of options named "--cpu-mask", "--cpu-range", etc for this purpose. But these options seem to only affect (old?) ggml internal thread pool implementation but not when using OpenMP for threading, which is now enabled by default.

This pull request adds back support for related options when using OpenMP, thus making it easier to set CPU affinity using CLI arguments.

Side note: I also think we should make a better default CPU affinity assignment. Right now the related code seems to be a bit messy. We have two functions cpu_get_num_physical_cores() and cpu_get_num_math() for almost the same purpose whilst the latter only implemented in x86 Linux. Since we do not set affinity we are relying on the OS task scheduler to do the right thing. Unfortunately at least on my machine (13700k on win11) the OS persistently uses E cores without explicit affinity settings. So I do think we should do affinity settings by default, at least on heterogeneous CPUs. This PR can serve as a basis towards that goal by allowing manual settings at first.

@danbev

* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...

ggml-cpu: Respect cpumask settings

8050b12

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 22, 2025

slaren approved these changes Sep 22, 2025

View reviewed changes

ggerganov merged commit 4e29084 into ggml-org:master Sep 23, 2025
60 of 68 checks passed

wishstudio deleted the cpumask branch September 23, 2025 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: Respect cpumask settings with OpenMP #16164

ggml-cpu: Respect cpumask settings with OpenMP #16164

Uh oh!

wishstudio commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

ggml-cpu: Respect cpumask settings with OpenMP #16164

ggml-cpu: Respect cpumask settings with OpenMP #16164

Uh oh!

Conversation

wishstudio commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!