Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added INSTRUCTIONS and CYCLES hardware perf counters on MacOS. #1404

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions docs/perf_counters.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
<a name="perf-counters" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave this in please? (or replace it with the equivalent github markdown anchor method)


# User-Requested Performance Counters

When running benchmarks, the user may choose to request collection of
Expand All @@ -9,15 +7,31 @@ performance improvement matches expectations.

This feature is available if:

* The benchmark is run on an architecture featuring a Performance Monitoring
Unit (PMU),
* The benchmark is compiled with support for collecting counters. Currently,
this requires [libpfm](http://perfmon2.sourceforge.net/) be available at build
time
* The benchmark is run on an architecture featuring a Performance Monitoring Unit (PMU),
* The benchmark is compiled with support for collecting counters.


The feature does not require modifying benchmark code. Counter collection is
handled at the boundaries where timer collection is also handled.

The counter values are reported back through the [User Counters](../README.md#custom-counters)
mechanism, meaning, they are available in all the formats (e.g. JSON) supported
by User Counters.

## MacOS
MacOS, on Apple Silicon and Intel, has built in support for per thread instruction
and cycle counters. These counters can be queried by the by semi-undocumented API
in libpthread `thread_selfcounts`. Benchmark support for these counters is always
enabled as it requires no additional dependencies.

To use, pass a comma-separated list of counter names through the
`--benchmark_perf_counters` flag. The only available counter names
are `CYCLES` and `INSTRUCTIONS`.

## Linux
Currently, this requires [libpfm](http://perfmon2.sourceforge.net/) be available
at build time.

To opt-in:

* Install `libpfm4-dev`, e.g. `apt-get install libpfm4-dev`.
Expand All @@ -29,6 +43,3 @@ they are platform specific, but some (e.g. `CYCLES` or `INSTRUCTIONS`) are
mapped by libpfm to platform-specifics - see libpfm
[documentation](http://perfmon2.sourceforge.net/docs.html) for more details.

The counter values are reported back through the [User Counters](../README.md#custom-counters)
mechanism, meaning, they are available in all the formats (e.g. JSON) supported
by User Counters.
44 changes: 44 additions & 0 deletions src/perf_counters.cc
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,50 @@ void PerfCounters::CloseCounters() const {
close(fd);
}
}
#elif defined(BENCHMARK_OS_MACOSX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it's worth splitting this file into perf_counters_linux.cc and perf_counters_osx.cc

(then we can selectively compile on top of using these guards)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a bad idea

const bool PerfCounters::kSupported = true;

//nothing to initialize (could check if the thread_self syscall is available)
bool PerfCounters::Initialize() { return true; }

//nothing to close
void PerfCounters::CloseCounters() const {}

PerfCounters PerfCounters::Create(
const std::vector<std::string>& counter_names) {

if (counter_names.empty()) {
return NoCounters();
}

if (counter_names.size() > PerfCounterValues::kMaxCounters) {
GetErrorLogInstance()
<< counter_names.size()
<< " counters were requested. The minimum is 1, the maximum is "
<< PerfCounterValues::kMaxCounters
<< "\n";
return NoCounters();
}

std::vector<int> counter_ids(counter_names.size());

for (size_t i=0;i<counter_names.size();i++) {
if (counter_names[i]=="INSTRUCTIONS") {
counter_ids[i]=1; //MT_CORE_INSTR: the index in the syscall result array for the instruction counter
}
else if (counter_names[i]=="CYCLES") {
counter_ids[i]=0; //MT_CORE_CYCLES: the index in the syscall result array for the cycles counter
} else {
GetErrorLogInstance()
<< "Unknown counter "
<< counter_names[i] << "\n";
return NoCounters();
}
}

return PerfCounters(counter_names, std::move(counter_ids));
}

#else // defined HAVE_LIBPFM
const bool PerfCounters::kSupported = false;

Expand Down
31 changes: 31 additions & 0 deletions src/perf_counters.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@
#pragma warning(disable : 4251)
#endif

#ifdef BENCHMARK_OS_MACOSX
//this is the OSX syscall wrapper function, implemented in libpthread, that returns
//per thread instruction and cycle counters. The counts remain correct over context
//switches and core migrations.
extern "C" int thread_selfcounts(int type, void *buf, size_t nbytes);
dmah42 marked this conversation as resolved.
Show resolved Hide resolved
#endif

namespace benchmark {
namespace internal {

Expand All @@ -55,7 +62,12 @@ class PerfCounterValues {

uint64_t operator[](size_t pos) const { return values_[kPadding + pos]; }

#ifdef BENCHMARK_OS_MACOSX
//OSX only supports cycles and instructions
static constexpr size_t kMaxCounters = 2;
#else
static constexpr size_t kMaxCounters = 3;
#endif

private:
friend class PerfCounters;
Expand All @@ -66,7 +78,12 @@ class PerfCounterValues {
sizeof(uint64_t) * (kPadding + nr_counters_)};
}

#ifdef BENCHMARK_OS_MACOSX
//the padding is not needed on OSX
static constexpr size_t kPadding = 0;
#else
static constexpr size_t kPadding = 1;
#endif
std::array<uint64_t, kPadding + kMaxCounters> values_;
const size_t nr_counters_;
};
Expand Down Expand Up @@ -104,11 +121,25 @@ class BENCHMARK_EXPORT PerfCounters final {
// names()[i]'s value is (*values)[i]
BENCHMARK_ALWAYS_INLINE bool Snapshot(PerfCounterValues* values) const {
#ifndef BENCHMARK_OS_WINDOWS
#ifdef BENCHMARK_OS_MACOSX
//call the undocumented syscall wrapper function to get per thread instructions and cycles
//the OS maintains these counters across context switches/cpu migrations.
uint64_t counts[2] = {}; //counts[0]=cycles, counts[1]=instructions
int res = thread_selfcounts(1, counts, sizeof(counts));

//copy the number of counters we have and reindex
uint64_t* buffer = (uint64_t*)values->get_data_buffer().first;
for (size_t i=0;i<counter_ids_.size();i++)
buffer[i]=counts[ counter_ids_[i] ];

return res==0;
#else
assert(values != nullptr);
assert(IsValid());
auto buffer = values->get_data_buffer();
auto read_bytes = ::read(counter_ids_[0], buffer.first, buffer.second);
return static_cast<size_t>(read_bytes) == buffer.second;
#endif
#else
(void)values;
return false;
Expand Down