Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align benchmark::State to a cacheline. #1230

Merged
merged 7 commits into from
Aug 16, 2024
Merged

Conversation

ckennelly
Copy link
Contributor

This can avoid interference with neighboring objects and stabilize
benchmark results.

@google-cla google-cla bot added the cla: yes label Sep 17, 2021
This can avoid interference with neighboring objects and stabilize
benchmark results.
@oontvoo
Copy link
Member

oontvoo commented Sep 17, 2021

@dominichamon Another reason to have absl dependency :) Would make this PR so much simpler.

@dmah42
Copy link
Member

dmah42 commented Sep 17, 2021

do you have the results from a run with and without this change?

@ckennelly
Copy link
Contributor Author

@dominichamon I only have an internal microbenchmark, which demonstrated higher variability across runs (especially when combined with a change to TCMalloc) without the alignment attribute.

@LebedevRI
Copy link
Collaborator

This seems plausible (although hard to tell without a test), but is this only about the State?
What about ThreadTimer, ThreadManager, PerfCountersMeasurement?

Copy link
Contributor

@chfast chfast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use std::hardware_destructive_interference_size from C++17 if available?

@dmah42
Copy link
Member

dmah42 commented Aug 16, 2024

This seems plausible (although hard to tell without a test), but is this only about the State? What about ThreadTimer, ThreadManager, PerfCountersMeasurement?

we could try to run some tests, but i don't think it should block this PR (which i totally missed).

or just align all the things.

@dmah42
Copy link
Member

dmah42 commented Aug 16, 2024

Can you use std::hardware_destructive_interference_size from C++17 if available?

not in the public header. soon.

dmah42
dmah42 previously approved these changes Aug 16, 2024
@dmah42
Copy link
Member

dmah42 commented Aug 16, 2024

yeah your way is better because it separates the definition of the cacheline size from the macro that uses it.

@@ -290,11 +290,50 @@ BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
#define BENCHMARK_OVERRIDE
#endif

#if defined(__GNUC__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure if this outermost guard is needed, but then i don't know what set of per-cpu defines MSVC provides.

Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, thank you!

@dmah42 dmah42 merged commit 6126d2a into google:main Aug 16, 2024
80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants