ggml-cpu: Respect cpumask settings with OpenMP #16164
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is useful to set CPU affinity on CPUs with heterogeneous cores to force using performance cores, in order to eliminate CPU stalling caused by speed mismatch between different types of cores.
In llama.cpp there are a series of options named "--cpu-mask", "--cpu-range", etc for this purpose. But these options seem to only affect (old?) ggml internal thread pool implementation but not when using OpenMP for threading, which is now enabled by default.
This pull request adds back support for related options when using OpenMP, thus making it easier to set CPU affinity using CLI arguments.
Side note: I also think we should make a better default CPU affinity assignment. Right now the related code seems to be a bit messy. We have two functions
cpu_get_num_physical_cores()
andcpu_get_num_math()
for almost the same purpose whilst the latter only implemented in x86 Linux. Since we do not set affinity we are relying on the OS task scheduler to do the right thing. Unfortunately at least on my machine (13700k on win11) the OS persistently uses E cores without explicit affinity settings. So I do think we should do affinity settings by default, at least on heterogeneous CPUs. This PR can serve as a basis towards that goal by allowing manual settings at first.