[Benchmarks] Pin benchmarks to small set of cores #20403

PatKamin · 2025-10-20T17:39:27Z

For better results stability, pin benchmark binaries to four cores with the maximum available frequency.

PatKamin · 2025-10-21T10:30:20Z

Test run on the PVC perf machine: https://github.com/intel/llvm/actions/runs/18679758239/job/53258464661

devops/scripts/benchmarks/benches/compute.py

For better results stability, pin benchmark binaries to four cores with the maximum available frequency.

lslusarczyk

some suggestions

lslusarczyk · 2025-10-22T05:37:52Z

devops/scripts/benchmarks/benches/compute.py

+                core_frequencies.append((core, freq))
+        core_frequencies.sort(key=lambda x: x[1], reverse=True)
+        available_cores = [core for core, _ in core_frequencies[:4]]
+        cores_list = ",".join([str(core) for core in available_cores])


cores_list = ",".join([str(core) for core, _ in core_frequencies[:4]])
unless you think two lines are more readable

That's cleaner, done.

lslusarczyk · 2025-10-22T05:48:59Z

devops/scripts/benchmarks/benches/compute.py

+                freq = int(f.read().strip())
+                core_frequencies.append((core, freq))
+        core_frequencies.sort(key=lambda x: x[1], reverse=True)
+        available_cores = [core for core, _ in core_frequencies[:4]]


Where does numer 4 comes from?

Shouldn't we take all cores that have frequency as the fastest one? + maybe not less than 4

available_cores=[cf[0] for idx, cf in enumerate(core_frequencies) if cf[1] == core_frequencies[0][1] or idx<3]

plus maybe warn if not all 4 first cores have the same frequency?

Having issues on one setup with the last P-core made me leave exactly 4 cores for benchmarks which seems to be enough for all single-threaded scenarios and for Compute Benchmarks' Memcpy multi-threaded scenarios where 4 threads are used.

I've triggered tests for llama scenarios where the benchmark binary runs 8 threads to see if limiting to 4 cores has any impact on the results: https://github.com/intel/llvm/actions/runs/18712800917.

All setups in CI satisfy the 4 cores with maximum frequency requirement, so no warning added.

lslusarczyk · 2025-10-22T05:51:30Z

devops/actions/run-tests/benchmark/action.yml

    run: |
      # Compute the core range for the first NUMA node; second node is used by
-      # UMF. Skip the first 4 cores as the kernel is likely to schedule more
+      # UMF. Skip the first 3 cores as the kernel is likely to schedule more


Where does number 3 come from?

On one of machines I've had issues with the last P-core - 7th. In order to guarantee 4 P-cores for benchmarks on all CI setups, I use also the core number 3. 4 cores left was a rule of thumb anyway.

lslusarczyk · 2025-10-22T11:40:57Z

devops/scripts/benchmarks/benches/base.py

+            ) as f:
+                freq = int(f.read().strip())
+                core_frequencies.append((core, freq))
+        core_frequencies.sort(key=lambda x: x[1], reverse=True)


as we discussed, sorting is not needed as we already assume that we have to take first 4 cores, not other ones because of this faulty 7th core on one of BMGs

Please remove sorting, add warning if selected cores differ in frequency between themselves.

devops/scripts/benchmarks/benches/base.py

devops/scripts/benchmarks/utils/utils.py

pbalcer · 2025-10-23T06:39:28Z

devops/scripts/benchmarks/benches/base.py

            )
        }

+    def taskset_cmd(self) -> list[str]:


This means we will nest tasksets. One to run the python script, second to run the actual benchmark. I don't know how these interact, but it seems odd. I'd rather we do these sort of calculations in one place.

Yes, I want to leave the maximum possible compute resources for building all the projects but limit available cores to the max frequency ones for benchmark scenarios runs. The difference in total build times is significant if we are to limit builds to just 4 cores.

However, I think that if we are to drop one of these tasksets, I would drop the "outer" one as we can use all the cores from a socket for builds, and move the logic of leaving first cores and picking up ones with the maximum frequency to python.

lslusarczyk · 2025-10-23T09:23:08Z

devops/scripts/benchmarks/requirements.txt

 dataclasses-json==0.6.7
 PyYAML==6.0.1
 Mako==1.3.0
+psutil>=7.0.0


why? it was missing for some reason or is it a mistake?

I'm adding psutil.Process().cpu_affinity() usage. This is a third-party package, not a part of the standard library AFAIK.

PatKamin requested review from a team as code owners October 20, 2025 17:39

PatKamin temporarily deployed to WindowsCILock October 20, 2025 17:39 — with GitHub Actions Inactive

PatKamin temporarily deployed to WindowsCILock October 20, 2025 18:01 — with GitHub Actions Inactive

PatKamin force-pushed the pin-4-cores-for-benches branch from 10804b3 to 8855934 Compare October 20, 2025 18:54

PatKamin temporarily deployed to WindowsCILock October 20, 2025 18:54 — with GitHub Actions Inactive

PatKamin temporarily deployed to WindowsCILock October 20, 2025 19:15 — with GitHub Actions Inactive

PatKamin had a problem deploying to WindowsCILock October 20, 2025 19:15 — with GitHub Actions Failure

lukaszstolarczuk reviewed Oct 21, 2025

View reviewed changes

devops/scripts/benchmarks/benches/compute.py Outdated Show resolved Hide resolved

PatKamin requested review from a team and vinser52 October 21, 2025 12:06

[Benchmarks] Pin benchmarks to small set of cores

21fa7fb

For better results stability, pin benchmark binaries to four cores with the maximum available frequency.

PatKamin temporarily deployed to WindowsCILock October 21, 2025 13:35 — with GitHub Actions Inactive

vinser52 approved these changes Oct 21, 2025

View reviewed changes

lslusarczyk reviewed Oct 22, 2025

View reviewed changes

PatKamin force-pushed the pin-4-cores-for-benches branch from 8855934 to e5b651a Compare October 22, 2025 10:08

PatKamin temporarily deployed to WindowsCILock October 22, 2025 10:08 — with GitHub Actions Inactive

PatKamin had a problem deploying to WindowsCILock October 22, 2025 10:35 — with GitHub Actions Failure

lslusarczyk reviewed Oct 22, 2025

View reviewed changes

lukaszstolarczuk reviewed Oct 22, 2025

View reviewed changes

devops/scripts/benchmarks/benches/base.py Show resolved Hide resolved

devops/scripts/benchmarks/utils/utils.py Outdated Show resolved Hide resolved

PatKamin force-pushed the pin-4-cores-for-benches branch from e5b651a to 670adfc Compare October 22, 2025 14:50

PatKamin temporarily deployed to WindowsCILock October 22, 2025 14:50 — with GitHub Actions Inactive

PatKamin requested review from lslusarczyk and lukaszstolarczuk October 22, 2025 14:51

PatKamin temporarily deployed to WindowsCILock October 22, 2025 15:13 — with GitHub Actions Inactive

Pin benchmarks to 4 cores in all suites

2b495fa

Review updates

670adfc

pbalcer reviewed Oct 23, 2025

View reviewed changes

lslusarczyk reviewed Oct 23, 2025

View reviewed changes

[Benchmarks] Pin benchmarks to small set of cores #20403

Are you sure you want to change the base?

[Benchmarks] Pin benchmarks to small set of cores #20403

Conversation

PatKamin commented Oct 20, 2025

Uh oh!

PatKamin commented Oct 21, 2025

Uh oh!

Uh oh!

lslusarczyk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants