[Intel-SIG] Backport Arch-PEBS for Intel new platforms#118
[Intel-SIG] Backport Arch-PEBS for Intel new platforms#118x56Jason wants to merge 89 commits into6.6-velinuxfrom
Conversation
commit d87d221 upstream. From PMU's perspective, the ADL e-core and newer SRF/GRR have a similar uarch. Most of the initialization code can be shared. Factor out intel_pmu_init_grt() for the common initialization code. The common part of the ADL e-core will be replaced by the later patch. Intel-SIG: commit d87d221 perf/x86/intel: Factor out the initialization code for ADL e-core Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230829125806.3016082-4-kan.liang@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 299a5fc upstream. Use the intel_pmu_init_glc() and intel_pmu_init_grt() to replace the duplicate code for ADL. The current code already checks the PERF_X86_EVENT_TOPDOWN flag before invoking the Topdown metrics functions. (The PERF_X86_EVENT_TOPDOWN flag is to indicate the Topdown metric feature, which is only available for the p-core.) Drop the unnecessary adl_set_topdown_event_period() and adl_update_topdown_event(). Intel-SIG: commit 299a5fc perf/x86/intel: Apply the common initialization code for ADL Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230829125806.3016082-5-kan.liang@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b0560bf upstream. There is a fairly long list of grievances about the current code. The main beefs: 1. hybrid_big_small assumes that the *HARDWARE* (CPUID) provided core types are a bitmap. They are not. If Intel happened to make a core type of 0xff, hilarity would ensue. 2. adl_get_hybrid_cpu_type() utterly inscrutable. There are precisely zero comments and zero changelog about what it is attempting to do. According to Kan, the adl_get_hybrid_cpu_type() is there because some Alder Lake (ADL) CPUs can do some silly things. Some ADL models are *supposed* to be hybrid CPUs with big and little cores, but there are some SKUs that only have big cores. CPUID(0x1a) on those CPUs does not say that the CPUs are big cores. It apparently just returns 0x0. It confuses perf because it expects to see either 0x40 (Core) or 0x20 (Atom). The perf workaround for this is to watch for a CPU core saying it is type 0x0. If that happens on an Alder Lake, it calls x86_pmu.get_hybrid_cpu_type() and just assumes that the core is a Core (0x40) CPU. To fix up the mess, separate out the CPU types and the 'pmu' types. This allows 'hybrid_pmu_type' bitmaps without worrying that some future CPU type will set multiple bits. Since the types are now separate, add a function to glue them back together again. Actual comment on the situation in the glue function (find_hybrid_pmu_for_cpu()). Also, give ->get_hybrid_cpu_type() a real return type and make it clear that it is overriding the *CPU* type, not the PMU type. Rename cpu_type to pmu_type in the struct x86_hybrid_pmu to reflect the change. Intel-SIG: commit b0560bf perf/x86/intel: Clean up the hybrid CPU type handling code Backport CWF PMU support and dependency Originally-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230829125806.3016082-6-kan.liang@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> [jz: fix pmu->cpu_type in hybrid_td_is_visible() due to difference between 6.6 stable branch and vanilla upstream] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 97588df upstream. The current hybrid initialization codes aren't well organized and are hard to read. Factor out intel_pmu_init_hybrid() to do a common setup for each hybrid PMU. The PMU-specific capability will be updated later via either hard code (ADL) or CPUID hybrid enumeration (MTL). Splitting the ADL and MTL initialization codes, since they have different uarches. The hard code PMU capabilities are not required for MTL either. They can be enumerated by the new leaf 0x23 and IA32_PERF_CAPABILITIES MSR. The hybrid enumeration of the IA32_PERF_CAPABILITIES MSR is broken on MTL. Using the default value. Intel-SIG: commit 97588df perf/x86/intel: Add common intel_pmu_init_hybrid() Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230829125806.3016082-7-kan.liang@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> [jz: resolve conflicts in update_pmu_cap() due to following commit 47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF") already been backported by stable branch] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 950ecdc upstream. Unnecessary multiplexing is triggered when running an "instructions" event on an MTL. perf stat -e cpu_core/instructions/,cpu_core/instructions/ -a sleep 1 Performance counter stats for 'system wide': 115,489,000 cpu_core/instructions/ (50.02%) 127,433,777 cpu_core/instructions/ (49.98%) 1.002294504 seconds time elapsed Linux architectural perf events, e.g., cycles and instructions, usually have dedicated fixed counters. These events also have equivalent events which can be used in the general-purpose counters. The counters are precious. In the intel_pmu_check_event_constraints(), perf check/extend the event constraints of these events. So these events can utilize both fixed counters and general-purpose counters. The following cleanup commit: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()") forgot adding the intel_pmu_check_event_constraints() into update_pmu_cap(). The architectural perf events cannot utilize the general-purpose counters. The code to check and update the counters, event constraints and extra_regs is the same among hybrid systems. Move intel_pmu_check_hybrid_pmus() to init_hybrid_pmu(), and emove the duplicate check in update_pmu_cap(). Intel-SIG: commit 950ecdc perf/x86/intel: Fix broken fixed event constraints extension Backport CWF PMU support and dependency Fixes: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230911135128.2322833-1-kan.liang@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> [jz: resolve context conflicts in update_pmu_cap() due to following commit 47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF") already been backported by stable branch] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
…the kernel commit 76db7aa upstream. Sync the new sample type for the branch counters feature. Intel-SIG: commit 76db7aa tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-6-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e8df9d9 upstream. When running perf-stat command on Intel hybrid platform, perf-stat reports the following errors: sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1 Opening: cpu/cycles/:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -16 Performance counter stats for 'sleep 1': <not counted> cpu_atom/instructions/ It looks the cpu_atom/instructions/ event can't be enabled on atom PMU even when the process is pinned on atom core. Investigation shows that exclusive_event_init() helper always returns -EBUSY error in the perf event creation. That's strange since the atom PMU should not be an exclusive PMU. Further investigation shows the issue was introduced by commit: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()") The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT from PMU capabilities if intel_cap.pebs_output_pt_available is not set, but it incorrectly uses 'or' operation and leads to all PMU capabilities bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT. Testing this fix on Intel hybrid platforms, the observed issues disappear. Intel-SIG: commit e8df9d9 perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities Backport CWF PMU support and dependency Fixes: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 72b8b94 upstream. Sort header files alphabetically. Intel-SIG: commit 72b8b94 powercap: intel_rapl: Sort header files Backport CWF PMU support and dependency Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 575024a upstream. Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu(). RAPL driver can invoke these APIs to expose its supported energy counters via perf PMU. The new RAPL PMU is fully compatible with current MSR RAPL PMU, including using the same PMU name and events name/id/unit/scale, etc. For example, use below command perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO to get the energy consumption if power/energy-pkg/ and power/energy-ram/ events are available in the "perf list" output. This does not introduce any conflict because TPMI RAPL is the only user of these APIs currently, and it never co-exists with MSR RAPL. Note that RAPL Packages can be probed/removed dynamically, and the events supported by each TPMI RAPL device can be different. Thus the RAPL PMU support is done on demand, which means 1. PMU is registered only if it is needed by a RAPL Package. PMU events for unsupported counters are not exposed. 2. PMU is unregistered and registered when a new RAPL Package is probed and supports new counters that are not supported by current PMU. For example, on a dual-package system using TPMI RAPL, it is possible that Package 1 behaves as TPMI domain root and supports Psys domain. In this case, register PMU without Psys event when probing Package 0, and re-register the PMU with Psys event when probing Package 1. 3. PMU is unregistered when all registered RAPL Packages don't need PMU. Intel-SIG: commit 575024a powercap: intel_rapl: Introduce APIs for PMU support Backport CWF PMU support and dependency Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 963a9ad upstream. Enable RAPL PMU support for TPMI RAPL driver. Intel-SIG: commit 963a9ad powercap: intel_rapl_tpmi: Enable PMU support Backport CWF PMU support and dependency Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit a23eb2f upstream. The current perf assumes that the counters that support PEBS are contiguous. But it's not guaranteed with the new leaf 0x23 introduced. The counters are enumerated with a counter mask. There may be holes in the counter mask for future platforms or in a virtualization environment. Store the PEBS event mask rather than the maximum number of PEBS counters in the x86 PMU structures. Intel-SIG: commit a23eb2f perf/x86/intel: Support the PEBS event mask Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 722e42e upstream. The current perf assumes that both GP and fixed counters are contiguous. But it's not guaranteed on newer Intel platforms or in a virtualization environment. Use the counter mask to replace the number of counters for both GP and the fixed counters. For the other ARCHs or old platforms which don't support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to replace. There is no functional change for them. The interface to KVM is not changed. The number of counters still be passed to KVM. It can be updated later separately. Intel-SIG: commit 722e42e perf/x86: Support counter mask Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> [jz: resolve conflicts in update_pmu_cap() due to following commit 47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF") already been backported by stable branch] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit a932aa0 upstream. From PMU's perspective, Lunar Lake and Arrow Lake are similar to the previous generation Meteor Lake. Both are hybrid platforms, with e-core and p-core. The key differences include: - The e-core supports 3 new fixed counters - The p-core supports an updated PEBS Data Source format - More GP counters (Updated event constraint table) - New Architectural performance monitoring V6 (New Perfmon MSRs aliasing, umask2, eq). - New PEBS format V6 (Counters Snapshotting group) - New RDPMC metrics clear mode The legacy features, the 3 new fixed counters and updated event constraint table are enabled in this patch. The new PEBS data source format, the architectural performance monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode are supported in the following patches. Intel-SIG: commit a932aa0 perf/x86: Add Lunar Lake and Arrow Lake support Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0902624 upstream. The model-specific pebs_latency_data functions of ADL and MTL use the "small" as a postfix to indicate the e-core. The postfix is too generic for a model-specific function. It cannot provide useful information that can directly map it to a specific uarch, which can facilitate the development and maintenance. Use the abbr of the uarch to rename the model-specific functions. Intel-SIG: commit 0902624 perf/x86/intel: Rename model-specific pebs_latency_data functions Backport CWF PMU support and dependency Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 608f697 upstream. A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings. A new layout is introduced into the union intel_x86_pebs_dse. Introduce the lnl_latency_data() to parse the new format. Enlarge the pebs_data_source[] accordingly to include new encodings. Only the mem load and the mem store events can generate the data source. Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and INTEL_HYBRID_STLAT_CONSTRAINT to mark them. Add two new bits for the new cache-related data src, L2_MHB and MSC. The L2_MHB is short for L2 Miss Handling Buffer, which is similar to LFB (Line Fill Buffer), but to track the L2 Cache misses. The MSC stands for the memory-side cache. Intel-SIG: commit 608f697 perf/x86/intel: Support new data source for Lunar Lake Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e8fb5d6 upstream. Different vendors may support different fields in EVENTSEL MSR, such as Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is used to filter the attr.config. Introduce a new config_mask to record the real supported EVENTSEL bitmask. Only apply it to the existing code now. No functional change. Intel-SIG: commit e8fb5d6 perf/x86: Add config_mask to represent EVENTSEL bitmask Backport CWF PMU support and dependency Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit dce0c74 upstream. Two new fields (the unit mask2, and the equal flag) are added in the IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX. Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout of the PERFEVTSEL. Expose the new formats into sysfs if they are available. The umask extension reuses the same format attr name "umask" as the previous umask. Add umask2_show to determine/display the correct format for the current machine. Intel-SIG: commit dce0c74 perf/x86/intel: Support PERFEVTSEL extension Backport CWF PMU support and dependency Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> [jz: due to following commit 47a973fd75639 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF") has already been backported by stable branch, - resolve context conflict in update_pmu_cap(). - definition for ARCH_PERFMON_EXT_UMASK2 and ARCH_PERFMON_EXT_EQ are not added because upstream commit 47a973fd75639 have removed them] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 149fd47 upstream. The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Intel-SIG: commit 149fd47 perf/x86/intel: Support Perfmon MSRs aliasing Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 7087bfb0adc9a12ec3b463b1d38072c5efce5d6c upstream. Modify the pebs_basic and pebs_meminfo structs to make the bitfields more explicit to ease readability of the code. Intel-SIG: commit 7087bfb0adc9 perf/x86/intel/ds: Clarify adaptive PEBS processing Backport CWF PMU support and dependency Co-developed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119135504.1463839-3-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 3c00ed344cef4dbb57d8769b961af414132a173a upstream. Factor out functions to process normal and the last PEBS records, which can be shared with the later patch. Move the event updating related codes (intel_pmu_save_and_restart()) to the end, where all samples have been processed. For the current usage, it doesn't matter when perf updates event counts and reset the counter. Because all counters are stopped when the PEBS buffer is drained. Drop the return of the !intel_pmu_save_and_restart(event) check. Because it never happen. The intel_pmu_save_and_restart(event) only returns 0, when !hwc->event_base or the period_left > 0. - The !hwc->event_base is impossible for the PEBS event, since the PEBS event is only available on GP and fixed counters, which always have a valid hwc->event_base. - The check only happens for the case of non-AUTO_RELOAD and single PEBS, which implies that the event must be overflowed. The period_left must be always <= 0 for an overflowed event after the x86_pmu_update(). Intel-SIG: commit 3c00ed344cef perf/x86/intel/ds: Factor out functions for PEBS records processing Backport CWF PMU support and dependency Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119135504.1463839-4-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
…PEBS commit ae55e308bde2267df79c4475daa85e174b7ab4c8 upstream. The current code may iterate all the PEBS records in the DS area several times. The first loop is to find all active events and calculate the available records for each event. Then iterate the whole buffer again and again to process available records until all active events are processed. The algorithm is inherited from the old generations. The old PEBS hardware does not deal well with the situation when events happen near each other. SW has to drop the error records. Multiple iterations are required. The hardware limit has been addressed on newer platforms with adaptive PEBS. A simple one-iteration algorithm is introduced. The samples are output by record order with the patch, rather than the event order. It doesn't impact the post-processing. The perf tool always sorts the records by time before presenting them to the end user. In an NMI, the last record has to be specially handled. Add a last[] variable to track the last unprocessed record of each event. Test: 11 PEBS events are used in the perf test. Only the basic information is collected. perf record -e instructions:up,...,instructions:up -c 2000003 benchmark The ftrace is used to record the duration of the intel_pmu_drain_pebs_icl(). The average duration reduced from 62.04us to 57.94us. A small improvement can be observed with the new algorithm. Also, the implementation becomes simpler and more straightforward. Intel-SIG: commit ae55e308bde2 perf/x86/intel/ds: Simplify the PEBS records processing for adaptive PEBS Backport CWF PMU support and dependency Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20241119135504.1463839-5-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0e45818ec1896c2b4aee0ec6721022ad625ea531 upstream. The new RDPMC enhancement, metrics clear mode, is to clear the PERF_METRICS-related resources as well as the fixed-function performance monitoring counter 3 after the read is performed. It is available for ring 3. The feature is enumerated by the IA32_PERF_CAPABILITIES.RDPMC_CLEAR_METRICS[bit 19]. To enable the feature, the IA32_FIXED_CTR_CTRL.METRICS_CLEAR_EN[bit 14] must be set. Two ways were considered to enable the feature. - Expose a knob in the sysfs globally. One user may affect the measurement of other users when changing the knob. The solution is dropped. - Introduce a new event format, metrics_clear, for the slots event to disable/enable the feature only for the current process. Users can utilize the feature as needed. The latter solution is implemented in the patch. The current KVM doesn't support the perf metrics yet. For virtualization, the feature can be enabled later separately. Intel-SIG: commit 0e45818ec189 perf/x86/intel: Support RDPMC metrics clear mode Backport CWF PMU support and dependency Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20241211160318.235056-1-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b6ccddd6fe1fd49c7a82b6fbed01cccad21a29c7 upstream. From the perspective of the uncore PMU, the Clearwater Forest is the same as the previous Sierra Forest. The only difference is the event list, which will be supported in the perf tool later. Intel-SIG: commit b6ccddd6fe1f perf/x86/intel/uncore: Add Clearwater Forest support Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241211161146.235253-1-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b8c3a2502a205321fe66c356f4b70cabd8e1a5fc upstream. The only difference between 5 and 6 is the new counters snapshotting group, without the following counters snapshotting enabling patches, it's impossible to utilize the feature in a PEBS record. It's safe to share the same code path with format 5. Add format 6, so the end user can at least utilize the legacy PEBS features. Intel-SIG: commit b8c3a2502a20 perf/x86/intel/ds: Add PEBS format 6 Backport CWF PMU support and dependency Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241216204505.748363-1-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 3f710be02ea648001ba18fb2c9fa7765e743dec2 upstream. The below warning may be triggered on GNR when the PCIE uncore units are exposed. WARNING: CPU: 4 PID: 1 at arch/x86/events/intel/uncore.c:1169 uncore_pci_pmu_register+0x158/0x190 The current uncore driver assumes that all the devices in the same PMU have the exact same devfn. It's true for the previous platforms. But it doesn't work for the new PCIE uncore units on GNR. The assumption doesn't make sense. There is no reason to limit the devices from the same PMU to the same devfn. Also, the current code just throws the warning, but still registers the device. The WARN_ON_ONCE() should be removed. The func_id is used by the later event_init() to check if a event->pmu has valid devices. For cpu and mmio uncore PMUs, they are always valid. For pci uncore PMUs, it's set when the PMU is registered. It can be replaced by the pmu->registered. Clean up the func_id. Intel-SIG: commit 3f710be02ea6 perf/x86/intel/uncore: Clean up func_id Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Eric Hu <eric.hu@intel.com> Link: https://lkml.kernel.org/r/20250108143017.1793781-1-kan.liang@linux.intel.com [ Aubrey Li: amend commit log ] Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 6d642735cdb6cdb814d2b6c81652caa53ce04842 upstream. The same CXL PMONs support is also avaiable on GNR. Apply spr_uncore_cxlcm and spr_uncore_cxldp to GNR as well. The other units were broken on early HW samples, so they were ignored in the early enabling patch. The issue has been fixed and verified on the later production HW. Add UPI, B2UPI, B2HOT, PCIEX16 and PCIEX8 for GNR. Intel-SIG: commit 6d642735cdb6 perf/x86/intel/uncore: Support more units on Granite Rapids Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Eric Hu <eric.hu@intel.com> Link: https://lkml.kernel.org/r/20250108143017.1793781-2-kan.liang@linux.intel.com [ Aubrey Li: amend commit log ] Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e02e9b0374c378aab016ae8ace60d9d98ab8caa6 upstream. The counters snapshotting is a new adaptive PEBS extension, which can capture programmable counters, fixed-function counters, and performance metrics in a PEBS record. The feature is available in the PEBS format V6. The target counters can be configured in the new fields of MSR_PEBS_CFG. Then the PEBS HW will generate the bit mask of counters (Counters Group Header) followed by the content of all the requested counters into a PEBS record. The current Linux perf sample read feature can read all events in the group when any event in the group is overflowed. But the rdpmc in the NMI/overflow handler has a small gap from overflow. Also, there is some overhead for each rdpmc read. The counters snapshotting feature can be used as an accurate and low-overhead replacement. Extend intel_update_topdown_event() to accept the value from PEBS records. Add a new PEBS_CNTR flag to indicate a sample read group that utilizes the counters snapshotting feature. When the group is scheduled, the PEBS configure can be updated accordingly. To prevent the case that a PEBS record value might be in the past relative to what is already in the event, perf always stops the PMU and drains the PEBS buffer before updating the corresponding event->count. Intel-SIG: commit e02e9b0374c3 perf/x86/intel: Support PEBS counters snapshotting Backport CWF PMU support and dependency Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250121152303.3128733-4-kan.liang@linux.intel.com [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e415c1493fa1e93afaec697385b8952d932c41bc upstream. Add events v1.00. Bring in the events from: https://github.com/intel/perfmon/tree/main/CWF/events Intel-SIG: commit e415c1493fa1 perf vendor events: Add Clearwaterforest events Backport CWF PMU support and dependency Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> [Dapeng Mi: resolve conflict and amend commit log] Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit c53e14f1ea4a8f8ddd9b2cd850fcbc0d934b79f5 upstream. The commit 97c79a3 ("perf core: Per event callchain limit") introduced a per-event term to allow finer tuning of the depth of callchains to save space. It should be applied to the branch stack as well. For example, autoFDO collections require maximum LBR entries. In the meantime, other system-wide LBR users may only be interested in the latest a few number of LBRs. A per-event LBR depth would save the perf output buffer. The patch simply drops the uninterested branches, but HW still collects the maximum branches. There may be a model-specific optimization that can reduce the HW depth for some cases to reduce the overhead further. But it isn't included in the patch set. Because it's not useful for all cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect LBRs. The depth should have less impact on the collecting overhead. The model-specific optimization may be implemented later separately. Intel-SIG: commit c53e14f1ea4a perf: Extend per event callchain limit to branch stack Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 4dfe3232cc04325a09e96f6c7f9546ba6c0b132b upstream. More and more features require a dynamic event constraint, e.g., branch counter logging, auto counter reload, Arch PEBS, etc. Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It avoids keeping adding the individual flag in intel_cpuc_prepare(). Add a variable dyn_constraint in the struct hw_perf_event to track the dynamic constraint of the event. Apply it if it's updated. Apply the generic dynamic constraint for branch counter logging. Many features on and after V6 require dynamic constraint. So unconditionally set the flag for V6+. Intel-SIG: commit 4dfe3232cc04 perf/x86: Add dynamic constraint Backport CWF PMU support and dependency Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit d9cf9c6884d21e01483c4e17479d27636ea4bb50 upstream.
After the commit 'd971342d38bf ("perf/x86/intel: Decouple BTS
initialization from PEBS initialization")' is introduced, x86_pmu.bts
would initialized in bts_init() which is hooked by arch_initcall().
Whereas init_hw_perf_events() is hooked by early_initcall(). Once the
core PMU is initialized, nmi watchdog initialization is called
immediately before bts_init() is called. It leads to the BTS buffer is
not really initialized since bts_init() is not called and x86_pmu.bts is
still false at that time. Worse, BTS buffer would never be initialized
then unless all core PMU events are freed and reserve_ds_buffers()
is called again.
Thus aligning with init_hw_perf_events(), use early_initcall() to hook
bts_init() to ensure x86_pmu.bts is initialized before nmi watchdog
initialization.
Intel-SIG: commit d9cf9c6884d2 perf/x86/intel: Use early_initcall() to hook bts_init()
Fixes: d971342d38bf ("perf/x86/intel: Decouple BTS initialization from PEBS initialization")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250820023032.17128-2-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0c5caea762de31a85cbcce65d978cec83449f699 upstream. IA32_PERF_CAPABILITIES.PEBS_TIMING_INFO[bit 17] is introduced to indicate whether timed PEBS is supported. Timed PEBS adds a new "retired latency" field in basic info group to show the timing info. Please find detailed information about timed PEBS in section 8.4.1 "Timed Processor Event Based Sampling" of "Intel Architecture Instruction Set Extensions and Future Features". This patch adds PERF_CAP_PEBS_TIMING_INFO flag and KVM module leverages this flag to expose timed PEBS feature to guest. Moreover, opportunistically refine the indents and make the macros share consistent indents. Intel-SIG: commit 0c5caea762de perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Link: https://lore.kernel.org/r/20250820023032.17128-5-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 75a9001bab36f0456f6aae1ab0aa487db456464a upstream. The perf_sample_data_init() has already set the period of sample, so no need to do it again. Intel-SIG: commit 75a9001bab36 perf/x86/intel/ds: Remove redundant assignments to sample.period Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250506094907.2724-1-changbin.du@huawei.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit f226805bc5f60adf03783d8e4cbfe303ccecd64e upstream. Check sample_type in perf_sample_save_callchain() to prevent saving callchain data when it isn't required. Intel-SIG: commit f226805bc5f6 perf/core: Check sample_type in perf_sample_save_callchain Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515193610.2350456-3-yabinc@google.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit faac6f105ef169e2e5678c14e1ffebf2a7d780b6 upstream. Check sample_type in perf_sample_save_brstack() to prevent saving branch stack data when it isn't required. Intel-SIG: commit faac6f105ef1 perf/core: Check sample_type in perf_sample_save_brstack Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515193610.2350456-4-yabinc@google.com Conflicts: include/linux/perf_event.h [jz: resolve simple context conflict] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0ba6502ce167fc3d598c08c2cc3b4ed7ca5aa251 upstream.
When running "perf mem record" command on CWF, the below KASAN
global-out-of-bounds warning is seen.
==================================================================
BUG: KASAN: global-out-of-bounds in cmt_latency_data+0x176/0x1b0
Read of size 4 at addr ffffffffb721d000 by task dtlb/9850
Call Trace:
kasan_report+0xb8/0xf0
cmt_latency_data+0x176/0x1b0
setup_arch_pebs_sample_data+0xf49/0x2560
intel_pmu_drain_arch_pebs+0x577/0xb00
handle_pmi_common+0x6c4/0xc80
The issue is caused by below code in __grt_latency_data(). The code
tries to access x86_hybrid_pmu structure which doesn't exist on
non-hybrid platform like CWF.
WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type == hybrid_big)
So add is_hybrid() check before calling this WARN_ON_ONCE to fix the
global-out-of-bounds access issue.
Intel-SIG: commit 0ba6502ce167 perf/x86/intel: Fix KASAN global-out-of-bounds warning
Fixes: 0902624 ("perf/x86/intel: Rename model-specific pebs_latency_data functions")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251028064214.1451968-1-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit c7f69dc073e51f1c448713320ccd2e2be63fb1f6 upstream. 2 is_x86_event() prototypes are defined in perf_event.h. Remove the redundant one. Intel-SIG: commit c7f69dc073e5 perf/x86: Remove redundant is_x86_event() prototype Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-2-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 7e772a93eb61cb6265bdd1c5bde17d0f2718b452 upstream.
When intel_pmu_drain_pebs_icl() is called to drain PEBS records, the
perf_event_overflow() could be called to process the last PEBS record.
While perf_event_overflow() could trigger the interrupt throttle and
stop all events of the group, like what the below call-chain shows.
perf_event_overflow()
-> __perf_event_overflow()
->__perf_event_account_interrupt()
-> perf_event_throttle_group()
-> perf_event_throttle()
-> event->pmu->stop()
-> x86_pmu_stop()
The side effect of stopping the events is that all corresponding event
pointers in cpuc->events[] array are cleared to NULL.
Assume there are two PEBS events (event a and event b) in a group. When
intel_pmu_drain_pebs_icl() calls perf_event_overflow() to process the
last PEBS record of PEBS event a, interrupt throttle is triggered and
all pointers of event a and event b are cleared to NULL. Then
intel_pmu_drain_pebs_icl() tries to process the last PEBS record of
event b and encounters NULL pointer access.
To avoid this issue, move cpuc->events[] clearing from x86_pmu_stop()
to x86_pmu_del(). It's safe since cpuc->active_mask or
cpuc->pebs_enabled is always checked before access the event pointer
from cpuc->events[].
Intel-SIG: commit 7e772a93eb61 perf/x86: Fix NULL event access and potential PEBS record loss
Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-3-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit ee98b8bfc7c4baca69a6852c4ecc399794f7e53b upstream. Use x86_pmu_drain_pebs static call to replace calling x86_pmu.drain_pebs function pointer. Intel-SIG: commit ee98b8bfc7c4 perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-4-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 5e4e355ae7cdeb0fef5dbe908866e1f895abfacc upstream. current large PEBS flag check only checks if sample_regs_user contains unsupported GPRs but doesn't check if sample_regs_intr contains unsupported GPRs. Of course, currently PEBS HW supports to sample all perf supported GPRs, the missed check doesn't cause real issue. But it won't be true any more after the subsequent patches support to sample SSP register. SSP sampling is not supported by adaptive PEBS HW and it would be supported until arch-PEBS HW. So correct this issue. Intel-SIG: commit 5e4e355ae7cd perf/x86/intel: Correct large PEBS flag check Fixes: a47ba4d ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-5-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit d243d0bb64af1e90ec18ac2fa6e7cadfe8895913 upstream. arch-PEBS leverages CPUID.23H.4/5 sub-leaves enumerate arch-PEBS supported capabilities and counters bitmap. This patch parses these 2 sub-leaves and initializes arch-PEBS capabilities and corresponding structures. Since IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs are no longer existed for arch-PEBS, arch-PEBS doesn't need to manipulate these MSRs. Thus add a simple pair of __intel_pmu_pebs_enable/disable() callbacks for arch-PEBS. Intel-SIG: commit d243d0bb64af perf/x86/intel: Initialize architectural PEBS Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-6-dapeng1.mi@linux.intel.com Conflicts: arch/x86/events/intel/ds.c arch/x86/events/perf_event.h [jz: resolve context conflict, use wrmsrl(), instead of wrmsrq()] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 8807d922705f0a137d8de5f636b50e7b4fbef155 upstream. Beside some PEBS record layout difference, arch-PEBS can share most of PEBS record processing code with adaptive PEBS. Thus, factor out these common processing code to independent inline functions, so they can be reused by subsequent arch-PEBS handler. Intel-SIG: commit 8807d922705f perf/x86/intel/ds: Factor out PEBS record processing code to functions Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-7-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 167cde7dc9b36b7a88f3c29d836fabce13023327 upstream. Adaptive PEBS and arch-PEBS share lots of same code to process these PEBS groups, like basic, GPR and meminfo groups. Extract these shared code to generic functions to avoid duplicated code. Intel-SIG: commit 167cde7dc9b3 perf/x86/intel/ds: Factor out PEBS group processing code to functions Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-8-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit d21954c8a0ffbc94ffdd65106fb6da5b59042e0a upstream. A significant difference with adaptive PEBS is that arch-PEBS record supports fragments which means an arch-PEBS record could be split into several independent fragments which have its own arch-PEBS header in each fragment. This patch defines architectural PEBS record layout structures and add helpers to process arch-PEBS records or fragments. Only legacy PEBS groups like basic, GPR, XMM and LBR groups are supported in this patch, the new added YMM/ZMM/OPMASK vector registers capturing would be supported in the future. Intel-SIG: commit d21954c8a0ff perf/x86/intel: Process arch-PEBS records or record fragments Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-9-dapeng1.mi@linux.intel.com [jz: use rdmsrl/wrmsrl instead of rdmsrq/wrmsrq] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 2721e8da2de7271533ac36285332219f700d16ca upstream. Arch-PEBS introduces a new MSR IA32_PEBS_BASE to store the arch-PEBS buffer physical address. This patch allocates arch-PEBS buffer and then initialize IA32_PEBS_BASE MSR with the buffer physical address. Intel-SIG: commit 2721e8da2de7 perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-10-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e89c5d1f290e8915e0aad10014f2241086ea95e4 upstream. arch-PEBS provides CPUIDs to enumerate which counters support PEBS sampling and precise distribution PEBS sampling. Thus PEBS constraints should be dynamically configured base on these counter and precise distribution bitmap instead of defining them statically. Update event dyn_constraint base on PEBS event precise level. Intel-SIG: commit e89c5d1f290e perf/x86/intel: Update dyn_constraint base on PEBS event precise level Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-11-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 52448a0a739002eca3d051a6ec314a0b178949a1 upstream. Different with legacy PEBS, arch-PEBS provides per-counter PEBS data configuration by programing MSR IA32_PMC_GPx/FXx_CFG_C MSRs. This patch obtains PEBS data configuration from event attribute and then writes the PEBS data configuration to MSR IA32_PMC_GPx/FXx_CFG_C and enable corresponding PEBS groups. Please notice this patch only enables XMM SIMD regs sampling for arch-PEBS, the other SIMD regs (OPMASK/YMM/ZMM) sampling on arch-PEBS would be supported after PMI based SIMD regs (OPMASK/YMM/ZMM) sampling is supported. Intel-SIG: commit 52448a0a7390 perf/x86/intel: Setup PEBS data configuration and enable legacy groups Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-12-dapeng1.mi@linux.intel.com [jz: use rdmsrl/wrmsrl instead of rdmsrq/wrmsrq] Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit bb5f13df3c455110c4468a31a5b21954268108c9 upstream. Base on previous adaptive PEBS counter snapshot support, add counter group support for architectural PEBS. Since arch-PEBS shares same counter group layout with adaptive PEBS, directly reuse __setup_pebs_counter_group() helper to process arch-PEBS counter group. Intel-SIG: commit bb5f13df3c45 perf/x86/intel: Add counter group support for arch-PEBS Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251029102136.61364-13-dapeng1.mi@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit bd24f9beed591422f45fa6d8d0d3bd3a755b8a48 upstream. The current event scheduler has a limit. If the counter constraint of an event is not a subset of any other counter constraint with an equal or higher weight. The counters may not be fully utilized. To workaround it, the commit bc1738f ("perf, x86: Fix event scheduler for constraints with overlapping counters") introduced an overlap flag, which is hardcoded to the event constraint that may trigger the limit. It only works for static constraints. Many features on and after Intel PMON v6 require dynamic constraints. An event constraint is decided by both static and dynamic constraints at runtime. See commit 4dfe3232cc04 ("perf/x86: Add dynamic constraint"). The dynamic constraints are from CPUID enumeration. It's impossible to hardcode it in advance. It's not practical to set the overlap flag to all events. It's harmful to the scheduler. For the existing Intel platforms, the dynamic constraints don't trigger the limit. A real fix is not required. However, for virtualization, VMM may give a weird CPUID enumeration to a guest. It's impossible to indicate what the weird enumeration is. A check is introduced, which can list the possible breaks if a weird enumeration is used. Check the dynamic constraints enumerated for normal, branch counters logging, and auto-counter reload. Check both PEBS and non-PEBS constratins. Intel-SIG: commit bd24f9beed59 perf/x86/intel: Add a check for dynamic constraints Closes: https://lore.kernel.org/lkml/20250416195610.GC38216@noisy.programming.kicks-ass.net/ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250512175542.2000708-1-kan.liang@linux.intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 02da693f6658b9f73b97fce3695358ef3f13d0d1 upstream.
Handle the interaction between ("perf/x86/intel: Update dyn_constraint
base on PEBS event precise level") and ("perf/x86/intel: Add a check
for dynamic constraints").
Intel-SIG: commit 02da693f6658 perf/x86/intel: Check PEBS dyn_constraints
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 2093d8cf80fa5552d1025a78a8f3a10bf3b6466e upstream. Similar to enable_acr_event, avoid the branch. Intel-SIG: commit 2093d8cf80fa perf/x86/intel: Optimize PEBS extended config Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 9929dffce5ed7e2988e0274f4db98035508b16d9 upstream.
The following commit introduced a build failure on x86-32:
21954c8a0ff ("perf/x86/intel: Process arch-PEBS records or record fragments")
...
arch/x86/events/intel/ds.c:2983:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
The forced type conversion to 'u64' and 'void *' are not 32-bit clean,
but they are also entirely unnecessary: ->pebs_vaddr is 'void *' already,
and integer-compatible pointer arithmetics will work just fine on it.
Fix & simplify the code.
Intel-SIG: commit 9929dffce5ed perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: d21954c8a0ff ("perf/x86/intel: Process arch-PEBS records or record fragments")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://patch.msgid.link/20251029102136.61364-10-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 9415f749d34b926b9e4853da1462f4d941f89a0d upstream.
handle_pmi_common() may observe an active bit set in cpuc->active_mask
while the corresponding cpuc->events[] entry has already been cleared,
which leads to a NULL pointer dereference.
This can happen when interrupt throttling stops all events in a group
while PEBS processing is still in progress. perf_event_overflow() can
trigger perf_event_throttle_group(), which stops the group and clears
the cpuc->events[] entry, but the active bit may still be set when
handle_pmi_common() iterates over the events.
The following recent fix:
7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and
relied on cpuc->active_mask/pebs_enabled checks. However,
handle_pmi_common() can still encounter a NULL cpuc->events[] entry
despite the active bit being set.
Add an explicit NULL check on the event pointer before using it,
to cover this legitimate scenario and avoid the NULL dereference crash.
Intel-SIG: commit 9415f749d34b perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Reported-by: kitta <kitta@linux.alibaba.com>
Co-developed-by: kitta <kitta@linux.alibaba.com>
Signed-off-by: Evan Li <evan.li@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 369e91bd201d15a711f952ee9ac253a8b91628a3 upstream.
To pick up changes from:
54de197c9a5e8f52 ("Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
679fcce0028bf101 ("Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD")
3767def18f4cc394 ("x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement")
f6106d41ec84e552 ("x86/bugs: Use an x86 feature to track the MMIO Stale Data mitigation")
7baadd463e147fdc ("x86/cpufeatures: Enumerate the LASS feature bits")
47955b58cf9b97fe ("x86/cpufeatures: Correct LKGS feature flag description")
5d0316e25defee47 ("x86/cpufeatures: Add X86_FEATURE_X2AVIC_EXT")
6ffdb49101f02313 ("x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag")
4793f990ea152330 ("KVM: x86: Advertise EferLmsleUnsupported to userspace")
bb5f13df3c455110 ("perf/x86/intel: Add counter group support for arch-PEBS")
52448a0a739002ec ("perf/x86/intel: Setup PEBS data configuration and enable legacy groups")
d21954c8a0ffbc94 ("perf/x86/intel: Process arch-PEBS records or record fragments")
bffeb2fd0b9c99d8 ("x86/microcode/intel: Enable staging when available")
740144bc6bde9d44 ("x86/microcode/intel: Establish staging control logic")
This should address these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Please see tools/include/uapi/README.
Intel-SIG: commit 369e91bd201d tools headers: Sync x86 headers with kernel sources
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[jz: only following 3 relevant commits are sync'ed:
bb5f13df3c455110 ("perf/x86/intel: Add counter group support for arch-PEBS")
52448a0a739002ec ("perf/x86/intel: Setup PEBS data configuration and enable legacy groups")
d21954c8a0ffbc94 ("perf/x86/intel: Process arch-PEBS records or record fragments")]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
|
Does LKVS already support PEBS testing? If yes, please provide a test case using LKVS. |
Yes, LKVS already supports PEBS and Arch-PEBS. Already provided test case using LKVS in Pull Request Description. |
|
@x56Jason can the lkvs test case cover the perf command test? if yes, we can only run lkvs. |
There are some minor differences between the LKVS test cases and the aforementioned dedicated perf commands. For example, the 5th command tests "--user-regs", the last test "perf test 100", they are not covered by LKVS. But overall, the LKVS tests are quite complete, it can be used for PMU subsystem testing. |
|
commit 7e772a93eb6 ("perf/x86: Fix NULL event access and potential PEBS record loss") fixes 9734e25fbf5a ("perf: Fix the throttle logic for a group"), since in our 6.6 branch, we don't have 9734e25fbf5a ("perf: Fix the throttle logic for a group") backported, shall we skip 7e772a93eb6? |
I would suggest to include it, because it is more robust than current code. |
Note
This Pull Request is based on previous PR64 of PMU support for CWF platform, and rebased to latest 6.6-velinux branch.
So it contains commits from PR64.
Description
This is to backport architectural PEBS support for Intel platforms like Clearwater Forest (CWF) and DMR.
The detailed information about arch-PEBS can be found in chapter 11 of "architectural PEBS" of "Intel Architecture Instruction Set Extensions and Future Features".
Test