Skip to content

Conversation

wishstudio
Copy link
Contributor

It is useful to set CPU affinity on CPUs with heterogeneous cores to force using performance cores, in order to eliminate CPU stalling caused by speed mismatch between different types of cores.

In llama.cpp there are a series of options named "--cpu-mask", "--cpu-range", etc for this purpose. But these options seem to only affect (old?) ggml internal thread pool implementation but not when using OpenMP for threading, which is now enabled by default.

This pull request adds back support for related options when using OpenMP, thus making it easier to set CPU affinity using CLI arguments.

Side note: I also think we should make a better default CPU affinity assignment. Right now the related code seems to be a bit messy. We have two functions cpu_get_num_physical_cores() and cpu_get_num_math() for almost the same purpose whilst the latter only implemented in x86 Linux. Since we do not set affinity we are relying on the OS task scheduler to do the right thing. Unfortunately at least on my machine (13700k on win11) the OS persistently uses E cores without explicit affinity settings. So I do think we should do affinity settings by default, at least on heterogeneous CPUs. This PR can serve as a basis towards that goal by allowing manual settings at first.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 22, 2025
@ggerganov ggerganov merged commit 4e29084 into ggml-org:master Sep 23, 2025
60 of 68 checks passed
@wishstudio wishstudio deleted the cpumask branch September 23, 2025 09:13
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 23, 2025
* origin/master: (39 commits)
ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200)
ci : enable Vulkan workflow on Mac (ggml-org#16194)
ggml-cpu: Respect cpumask settings (ggml-org#16164)
ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928)
zdnn: refactor codebase + add docs (ggml-org#16178)
codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190)
devops: add s390x containers (ggml-org#15915)
ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189)
feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177)
clang-tidy : disable warning about performance enum size (ggml-org#16127)
ggml : implement set_rows with i32 index (ggml-org#16159)
codeowners : update + cleanup (ggml-org#16174)
common : enable `--offline` mode without curl support (ggml-org#16137)
webui : fix handling incomplete chunks (ggml-org#16107)
embedding : fix typos in README (ggml-org#16171)
common : remove unused local variables (ggml-org#16140)
ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123)
ggml : add ggml_op_is_empty (ggml-org#16122)
codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128)
Vulkan: add conv_transpose_2d operation (ggml-org#16022)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants