Modern applications often contain multiple instances of complex data structures, making it challenging to analyze their memory access patterns. While tools like Linux Perf and Intel VTune excel at identifying resource-intensive instructions, they cannot differentiate between different instances of the same data structure sharing identical code - for example, different nodes within a tree structure experiencing varying access patterns.
perf-cpp addresses this limitation through its Memory Access Analyzer component, which works in conjunction with memory-based sampling (detailed in the sampling documentation). The Memory Access Analyzer helps identify which specific memory addresses experience high access latency by:
- Mapping samples to individual data object instances
- Generating detailed access statistics including cache hits/misses, TLB performance, and average latency metrics
→ For a practical implementation, check out our random-access-benchmark example.
- Describing Data Types
- Registering Data Type Instances
- Mapping Samples to Data Type Instances
- Processing the Result
The Memory Access Analyzer requires information about the structure of your data types. Let's walk through an example using a binary tree node:
class BinaryTreeNode {
std::uint64_t value;
BinaryTreeNode* left_child;
BinaryTreeNode* right_child;
};
To analyze this structure, create a perf::analyzer::DataType
definition:
#include <perfcpp/analyzer/memory_access.h>
auto binary_tree_node = perf::analyzer::DataType{"BinaryTreeNode", sizeof(BinaryTreeNode)};
binary_tree_node.add("value", sizeof(std::uint64_t)); /// Describe the "value" attribute.
binary_tree_node.add("left_child", sizeof(BinaryTreeNode*)); /// Describe the "left_child" attribute.
binary_tree_node.add("right_child", sizeof(BinaryTreeNode*)); /// Describe the "right_child" attribute.
Hint: For accurate size and offset information, you can use pahole. See Paramoud Kumbhar's detailed guide for usage instructions.
Since each instance of a data structure may exhibit different access patterns, the Memory Access Analyzer needs to track individual instances. Here's how to register them:
#include <perfcpp/analyzer/memory_access.h>
auto memory_access_analyzer = perf::analyzer::MemoryAccess{};
/// Expose the data type to the Analyzer.
memory_access_analyzer.add(std::move(binary_tree_node));
/// Expose memory addresses to the Analyzer.
for (auto* node : tree->nodes()) {
/// The first argument is the name describing the data type.
/// The second argument is a pointer to the instance.
memory_access_analyzer.annotate("BinaryTreeNode", node);
}
To collect memory access data, use perf-cpp's sampling mechanism with the following key requirements:
- Include logical memory addresses
- Capture data source information
- Record latency data ("weight")
- Use a memory-address-capable sample trigger (e.g.,
mem-loads
on Intel,ibs_op
on AMD – see the documentation)
#include <perfcpp/sampler.h>
#include <perfcpp/analyzer/memory_access.h>
auto counter_definitions = perf::CounterDefinition{};
auto sampler = perf::Sampler{ counter_definitions };
/// Set trigger that enables memory sampling.
sampler.trigger("mem-loads", perf::Precision::MustHaveZeroSkid, perf::Period{ 1000U });
/// Include addresses, data source, and latency.
sampler.values()
.logical_memory_address(true)
.data_src(true)
.weight_struct(true);
/// Run the workload while recording samples.
sampler.start();
///... execute ....
sampler.stop();
/// Get the samples and map to described and registered data types and instances.
const auto samples = sampler.result();
const auto result = memory_access_analyzer.map(samples);
The analyzer generates detailed statistics for each data type attribute. To view the results:
std::cout << result.to_string() << std::endl;
Example output:
DataType BinaryTreeNode (24B) {
| loads | cache hits | RAM hits | TLB | stores
samples | count latency | L1d LFB L2 L3 | local remote | L1 hits L2 hits misses | count latency
0: value (8B) 373 | 373 439 | 154 0 0 7 | 212 0 | 190 5 178 | 0 0
8: left_child (8B) 146 | 146 720 | 1 0 0 5 | 140 0 | 12 18 116 | 0 0
16: right_child (8B) 528 | 528 173 | 393 0 1 14 | 120 0 | 415 4 109 | 0 0
}
The output shows:
- Attribute details (offset, name, size)
- Sample counts
- Detailed performance metrics per attribute
For further analysis, export the results in structured formats:
result.to_json(); /// JSON format
result.to_csv(); /// CSV format