Skip to content

[Intel-SIG] Backport Arch-PEBS for Intel new platforms#118

Open
x56Jason wants to merge 89 commits into6.6-velinuxfrom
arch-pebs-6.6-on-cwf-pmu
Open

[Intel-SIG] Backport Arch-PEBS for Intel new platforms#118
x56Jason wants to merge 89 commits into6.6-velinuxfrom
arch-pebs-6.6-on-cwf-pmu

Conversation

@x56Jason
Copy link
Copy Markdown

@x56Jason x56Jason commented Feb 27, 2026

Note

This Pull Request is based on previous PR64 of PMU support for CWF platform, and rebased to latest 6.6-velinux branch.
So it contains commits from PR64.

Description

This is to backport architectural PEBS support for Intel platforms like Clearwater Forest (CWF) and DMR.
The detailed information about arch-PEBS can be found in chapter 11 of "architectural PEBS" of "Intel Architecture Instruction Set Extensions and Future Features".

Test

  • Build and reboot, PASS
  • Sanity test on GNR platform, PASS
  • Do following LKVS testing on CWF platform, PASS
  pmu_tests.sh -t arch_pebs_cpuid
  pmu_tests.sh -t arch_pebs_gp_reg_group
  pmu_tests.sh -t arch_pebs_xer_group
  pmu_tests.sh -t arch_pebs_counter_group
  pmu_tests.sh -t arch_pebs_counter_group_stres
  pmu_tests.sh -t arch_pebs_gp_counter
  pmu_tests.sh -t arch_pebs_basic_group
  pmu_tests.sh -t basic
  pmu_tests.sh -t counting
  pmu_tests.sh -t sampling
  pmu_tests.sh -t counting_multi
  • Do following tests on CWF platform, PASS
  1. Basic perf counting case.
    perf stat -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  2. Basic PMI based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  3. Basic PEBS based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}:p' sleep 1

  4. PEBS sampling case with basic, GPRs, vector-registers and LBR groups
    perf record -e branches:p -Iax,bx,ip,xmm0 -b -c 10000 sleep 1

  5. User space PEBS sampling case with basic, GPRs and LBR groups
    perf record -e branches:p --user-regs=ax,bx,ip -b -c 10000 sleep 1

  6. PEBS sampling case with auxiliary (memory info) group
    perf mem record sleep 1

  7. PEBS sampling case with counter group
    perf record -e '{branches:p,branches,cycles}:S' -c 10000 sleep 1

  8. Perf stat and record test
    perf test 100

Kan Liang and others added 30 commits February 25, 2026 23:09
commit d87d221 upstream.

From PMU's perspective, the ADL e-core and newer SRF/GRR have a similar
uarch. Most of the initialization code can be shared.

Factor out intel_pmu_init_grt() for the common initialization code.
The common part of the ADL e-core will be replaced by the later patch.

Intel-SIG: commit d87d221 perf/x86/intel: Factor out the initialization code for ADL e-core
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-4-kan.liang@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 299a5fc upstream.

Use the intel_pmu_init_glc() and intel_pmu_init_grt() to replace the
duplicate code for ADL.

The current code already checks the PERF_X86_EVENT_TOPDOWN flag before
invoking the Topdown metrics functions. (The PERF_X86_EVENT_TOPDOWN flag
is to indicate the Topdown metric feature, which is only available for
the p-core.) Drop the unnecessary adl_set_topdown_event_period() and
adl_update_topdown_event().

Intel-SIG: commit 299a5fc perf/x86/intel: Apply the common initialization code for ADL
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-5-kan.liang@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b0560bf upstream.

There is a fairly long list of grievances about the current code. The
main beefs:

   1. hybrid_big_small assumes that the *HARDWARE* (CPUID) provided
      core types are a bitmap. They are not. If Intel happened to
      make a core type of 0xff, hilarity would ensue.
   2. adl_get_hybrid_cpu_type() utterly inscrutable.  There are
      precisely zero comments and zero changelog about what it is
      attempting to do.

According to Kan, the adl_get_hybrid_cpu_type() is there because some
Alder Lake (ADL) CPUs can do some silly things. Some ADL models are
*supposed* to be hybrid CPUs with big and little cores, but there are
some SKUs that only have big cores. CPUID(0x1a) on those CPUs does
not say that the CPUs are big cores. It apparently just returns 0x0.
It confuses perf because it expects to see either 0x40 (Core) or
0x20 (Atom).

The perf workaround for this is to watch for a CPU core saying it is
type 0x0. If that happens on an Alder Lake, it calls
x86_pmu.get_hybrid_cpu_type() and just assumes that the core is a
Core (0x40) CPU.

To fix up the mess, separate out the CPU types and the 'pmu' types.
This allows 'hybrid_pmu_type' bitmaps without worrying that some
future CPU type will set multiple bits.

Since the types are now separate, add a function to glue them back
together again. Actual comment on the situation in the glue
function (find_hybrid_pmu_for_cpu()).

Also, give ->get_hybrid_cpu_type() a real return type and make it
clear that it is overriding the *CPU* type, not the PMU type.

Rename cpu_type to pmu_type in the struct x86_hybrid_pmu to reflect the
change.

Intel-SIG: commit b0560bf perf/x86/intel: Clean up the hybrid CPU type handling code
Backport CWF PMU support and dependency

Originally-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-6-kan.liang@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
[jz: fix pmu->cpu_type in hybrid_td_is_visible() due to difference between
     6.6 stable branch and vanilla upstream]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 97588df upstream.

The current hybrid initialization codes aren't well organized and are
hard to read.

Factor out intel_pmu_init_hybrid() to do a common setup for each
hybrid PMU. The PMU-specific capability will be updated later via either
hard code (ADL) or CPUID hybrid enumeration (MTL).

Splitting the ADL and MTL initialization codes, since they have
different uarches. The hard code PMU capabilities are not required for
MTL either. They can be enumerated by the new leaf 0x23 and
IA32_PERF_CAPABILITIES MSR.

The hybrid enumeration of the IA32_PERF_CAPABILITIES MSR is broken on
MTL. Using the default value.

Intel-SIG: commit 97588df perf/x86/intel: Add common intel_pmu_init_hybrid()
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-7-kan.liang@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
[jz: resolve conflicts in update_pmu_cap() due to following commit
	47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF")
     already been backported by stable branch]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 950ecdc upstream.

Unnecessary multiplexing is triggered when running an "instructions"
event on an MTL.

perf stat -e cpu_core/instructions/,cpu_core/instructions/ -a sleep 1

 Performance counter stats for 'system wide':

       115,489,000      cpu_core/instructions/                (50.02%)
       127,433,777      cpu_core/instructions/                (49.98%)

       1.002294504 seconds time elapsed

Linux architectural perf events, e.g., cycles and instructions, usually
have dedicated fixed counters. These events also have equivalent events
which can be used in the general-purpose counters. The counters are
precious. In the intel_pmu_check_event_constraints(), perf check/extend
the event constraints of these events. So these events can utilize both
fixed counters and general-purpose counters.

The following cleanup commit:

  97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

forgot adding the intel_pmu_check_event_constraints() into update_pmu_cap().
The architectural perf events cannot utilize the general-purpose counters.

The code to check and update the counters, event constraints and
extra_regs is the same among hybrid systems. Move
intel_pmu_check_hybrid_pmus() to init_hybrid_pmu(), and
emove the duplicate check in update_pmu_cap().

Intel-SIG: commit 950ecdc perf/x86/intel: Fix broken fixed event constraints extension
Backport CWF PMU support and dependency

Fixes: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230911135128.2322833-1-kan.liang@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
[jz: resolve context conflicts in update_pmu_cap() due to following commit
	47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF")
     already been backported by stable branch]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
…the kernel

commit 76db7aa upstream.

Sync the new sample type for the branch counters feature.

Intel-SIG: commit 76db7aa tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tinghao Zhang <tinghao.zhang@intel.com>
Link: https://lore.kernel.org/r/20231025201626.3000228-6-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e8df9d9 upstream.

When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:

  sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1

  Opening: cpu/cycles/:HG
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    config                           0xa00000000
    disabled                         1
  ------------------------------------------------------------
  sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -16

   Performance counter stats for 'sleep 1':

       <not counted>      cpu_atom/instructions/

It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.

Further investigation shows the issue was introduced by commit:

  97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.

Testing this fix on Intel hybrid platforms, the observed issues
disappear.

Intel-SIG: commit e8df9d9 perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
Backport CWF PMU support and dependency

Fixes: 97588df ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 72b8b94 upstream.

Sort header files alphabetically.

Intel-SIG: commit 72b8b94 powercap: intel_rapl: Sort header files
Backport CWF PMU support and dependency

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 575024a upstream.

Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu().

RAPL driver can invoke these APIs to expose its supported energy
counters via perf PMU. The new RAPL PMU is fully compatible with current
MSR RAPL PMU, including using the same PMU name and events
name/id/unit/scale, etc.

For example, use below command
 perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO
to get the energy consumption if power/energy-pkg/ and power/energy-ram/
events are available in the "perf list" output.

This does not introduce any conflict because TPMI RAPL is the only user
of these APIs currently, and it never co-exists with MSR RAPL.

Note that RAPL Packages can be probed/removed dynamically, and the
events supported by each TPMI RAPL device can be different. Thus the
RAPL PMU support is done on demand, which means
1. PMU is registered only if it is needed by a RAPL Package. PMU events
   for unsupported counters are not exposed.
2. PMU is unregistered and registered when a new RAPL Package is probed
   and supports new counters that are not supported by current PMU.
   For example, on a dual-package system using TPMI RAPL, it is possible
   that Package 1 behaves as TPMI domain root and supports Psys domain.
   In this case, register PMU without Psys event when probing Package 0,
   and re-register the PMU with Psys event when probing Package 1.
3. PMU is unregistered when all registered RAPL Packages don't need PMU.

Intel-SIG: commit 575024a powercap: intel_rapl: Introduce APIs for PMU support
Backport CWF PMU support and dependency

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 963a9ad upstream.

Enable RAPL PMU support for TPMI RAPL driver.

Intel-SIG: commit 963a9ad powercap: intel_rapl_tpmi: Enable PMU support
Backport CWF PMU support and dependency

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ Yunying Sun: amend commit log ]
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit a23eb2f upstream.

The current perf assumes that the counters that support PEBS are
contiguous. But it's not guaranteed with the new leaf 0x23 introduced.
The counters are enumerated with a counter mask. There may be holes in
the counter mask for future platforms or in a virtualization
environment.

Store the PEBS event mask rather than the maximum number of PEBS
counters in the x86 PMU structures.

Intel-SIG: commit a23eb2f perf/x86/intel: Support the PEBS event mask
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 722e42e upstream.

The current perf assumes that both GP and fixed counters are contiguous.
But it's not guaranteed on newer Intel platforms or in a virtualization
environment.

Use the counter mask to replace the number of counters for both GP and
the fixed counters. For the other ARCHs or old platforms which don't
support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to
replace. There is no functional change for them.

The interface to KVM is not changed. The number of counters still be
passed to KVM. It can be updated later separately.

Intel-SIG: commit 722e42e perf/x86: Support counter mask
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[jz: resolve conflicts in update_pmu_cap() due to following commit
	47a973fd7563 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF")
     already been backported by stable branch]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit a932aa0 upstream.

From PMU's perspective, Lunar Lake and Arrow Lake are similar to the
previous generation Meteor Lake. Both are hybrid platforms, with e-core
and p-core.

The key differences include:
- The e-core supports 3 new fixed counters
- The p-core supports an updated PEBS Data Source format
- More GP counters (Updated event constraint table)
- New Architectural performance monitoring V6
  (New Perfmon MSRs aliasing, umask2, eq).
- New PEBS format V6 (Counters Snapshotting group)
- New RDPMC metrics clear mode

The legacy features, the 3 new fixed counters and updated event
constraint table are enabled in this patch.

The new PEBS data source format, the architectural performance
monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode
are supported in the following patches.

Intel-SIG: commit a932aa0 perf/x86: Add Lunar Lake and Arrow Lake support
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0902624 upstream.

The model-specific pebs_latency_data functions of ADL and MTL use the
"small" as a postfix to indicate the e-core. The postfix is too generic
for a model-specific function. It cannot provide useful information that
can directly map it to a specific uarch, which can facilitate the
development and maintenance.
Use the abbr of the uarch to rename the model-specific functions.

Intel-SIG: commit 0902624 perf/x86/intel: Rename model-specific pebs_latency_data functions
Backport CWF PMU support and dependency

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 608f697 upstream.

A new PEBS data source format is introduced for the p-core of Lunar
Lake. The data source field is extended to 8 bits with new encodings.

A new layout is introduced into the union intel_x86_pebs_dse.
Introduce the lnl_latency_data() to parse the new format.
Enlarge the pebs_data_source[] accordingly to include new encodings.

Only the mem load and the mem store events can generate the data source.
Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and
INTEL_HYBRID_STLAT_CONSTRAINT to mark them.

Add two new bits for the new cache-related data src, L2_MHB and MSC.
The L2_MHB is short for L2 Miss Handling Buffer, which is similar to
LFB (Line Fill Buffer), but to track the L2 Cache misses.
The MSC stands for the memory-side cache.

Intel-SIG: commit 608f697 perf/x86/intel: Support new data source for Lunar Lake
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e8fb5d6 upstream.

Different vendors may support different fields in EVENTSEL MSR, such as
Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR
since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is
used to filter the attr.config.

Introduce a new config_mask to record the real supported EVENTSEL
bitmask.
Only apply it to the existing code now. No functional change.

Intel-SIG: commit e8fb5d6 perf/x86: Add config_mask to represent EVENTSEL bitmask
Backport CWF PMU support and dependency

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit dce0c74 upstream.

Two new fields (the unit mask2, and the equal flag) are added in the
IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX.

Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout
of the PERFEVTSEL.
Expose the new formats into sysfs if they are available. The umask
extension reuses the same format attr name "umask" as the previous
umask. Add umask2_show to determine/display the correct format
for the current machine.

Intel-SIG: commit dce0c74 perf/x86/intel: Support PERFEVTSEL extension
Backport CWF PMU support and dependency

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[jz: due to following commit
	47a973fd75639 ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF")
     has already been backported by stable branch,
     - resolve context conflict in update_pmu_cap().
     - definition for ARCH_PERFMON_EXT_UMASK2 and ARCH_PERFMON_EXT_EQ are
       not added because upstream commit 47a973fd75639 have removed them]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 149fd47 upstream.

The architectural performance monitoring V6 supports a new range of
counters' MSRs in the 19xxH address range. They include all the GP
counter MSRs, the GP control MSRs, and the fixed counter MSRs.

The step between each sibling counter is 4. Add intel_pmu_addr_offset()
to calculate the correct offset.

Add fixedctr in struct x86_pmu to store the address of the fixed counter
0. It can be used to calculate the rest of the fixed counters.

The MSR address of the fixed counter control is not changed.

Intel-SIG: commit 149fd47 perf/x86/intel: Support Perfmon MSRs aliasing
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 7087bfb0adc9a12ec3b463b1d38072c5efce5d6c upstream.

Modify the pebs_basic and pebs_meminfo structs to make the bitfields
more explicit to ease readability of the code.

Intel-SIG: commit 7087bfb0adc9 perf/x86/intel/ds: Clarify adaptive PEBS processing
Backport CWF PMU support and dependency

Co-developed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119135504.1463839-3-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 3c00ed344cef4dbb57d8769b961af414132a173a upstream.

Factor out functions to process normal and the last PEBS records, which
can be shared with the later patch.

Move the event updating related codes (intel_pmu_save_and_restart())
to the end, where all samples have been processed.
For the current usage, it doesn't matter when perf updates event counts
and reset the counter. Because all counters are stopped when the PEBS
buffer is drained.
Drop the return of the !intel_pmu_save_and_restart(event) check. Because
it never happen. The intel_pmu_save_and_restart(event) only returns 0,
when !hwc->event_base or the period_left > 0.
- The !hwc->event_base is impossible for the PEBS event, since the PEBS
  event is only available on GP and fixed counters, which always have
  a valid hwc->event_base.
- The check only happens for the case of non-AUTO_RELOAD and single
  PEBS, which implies that the event must be overflowed. The period_left
  must be always <= 0 for an overflowed event after the
  x86_pmu_update().

Intel-SIG: commit 3c00ed344cef perf/x86/intel/ds: Factor out functions for PEBS records processing
Backport CWF PMU support and dependency

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119135504.1463839-4-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
…PEBS

commit ae55e308bde2267df79c4475daa85e174b7ab4c8 upstream.

The current code may iterate all the PEBS records in the DS area several
times. The first loop is to find all active events and calculate the
available records for each event. Then iterate the whole buffer again
and again to process available records until all active events are
processed.

The algorithm is inherited from the old generations. The old PEBS
hardware does not deal well with the situation when events happen near
each other. SW has to drop the error records. Multiple iterations are
required.

The hardware limit has been addressed on newer platforms with adaptive
PEBS. A simple one-iteration algorithm is introduced.

The samples are output by record order with the patch, rather than the
event order. It doesn't impact the post-processing. The perf tool always
sorts the records by time before presenting them to the end user.

In an NMI, the last record has to be specially handled. Add a last[]
variable to track the last unprocessed record of each event.

Test:

11 PEBS events are used in the perf test. Only the basic information is
collected.
perf record -e instructions:up,...,instructions:up -c 2000003 benchmark

The ftrace is used to record the duration of the
intel_pmu_drain_pebs_icl().

The average duration reduced from 62.04us to 57.94us.

A small improvement can be observed with the new algorithm.
Also, the implementation becomes simpler and more straightforward.

Intel-SIG: commit ae55e308bde2 perf/x86/intel/ds: Simplify the PEBS records processing for adaptive PEBS
Backport CWF PMU support and dependency

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20241119135504.1463839-5-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0e45818ec1896c2b4aee0ec6721022ad625ea531 upstream.

The new RDPMC enhancement, metrics clear mode, is to clear the
PERF_METRICS-related resources as well as the fixed-function performance
monitoring counter 3 after the read is performed. It is available for
ring 3. The feature is enumerated by the
IA32_PERF_CAPABILITIES.RDPMC_CLEAR_METRICS[bit 19]. To enable the
feature, the IA32_FIXED_CTR_CTRL.METRICS_CLEAR_EN[bit 14] must be set.

Two ways were considered to enable the feature.
- Expose a knob in the sysfs globally. One user may affect the
  measurement of other users when changing the knob. The solution is
  dropped.
- Introduce a new event format, metrics_clear, for the slots event to
  disable/enable the feature only for the current process. Users can
  utilize the feature as needed.
The latter solution is implemented in the patch.

The current KVM doesn't support the perf metrics yet. For
virtualization, the feature can be enabled later separately.

Intel-SIG: commit 0e45818ec189 perf/x86/intel: Support RDPMC metrics clear mode
Backport CWF PMU support and dependency

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20241211160318.235056-1-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b6ccddd6fe1fd49c7a82b6fbed01cccad21a29c7 upstream.

From the perspective of the uncore PMU, the Clearwater Forest is the
same as the previous Sierra Forest. The only difference is the event
list, which will be supported in the perf tool later.

Intel-SIG: commit b6ccddd6fe1f perf/x86/intel/uncore: Add Clearwater Forest support
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241211161146.235253-1-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit b8c3a2502a205321fe66c356f4b70cabd8e1a5fc upstream.

The only difference between 5 and 6 is the new counters snapshotting
group, without the following counters snapshotting enabling patches,
it's impossible to utilize the feature in a PEBS record. It's safe to
share the same code path with format 5.

Add format 6, so the end user can at least utilize the legacy PEBS
features.

Intel-SIG: commit b8c3a2502a20 perf/x86/intel/ds: Add PEBS format 6
Backport CWF PMU support and dependency

Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241216204505.748363-1-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 3f710be02ea648001ba18fb2c9fa7765e743dec2 upstream.

The below warning may be triggered on GNR when the PCIE uncore units are
exposed.

WARNING: CPU: 4 PID: 1 at arch/x86/events/intel/uncore.c:1169 uncore_pci_pmu_register+0x158/0x190

The current uncore driver assumes that all the devices in the same PMU
have the exact same devfn. It's true for the previous platforms. But it
doesn't work for the new PCIE uncore units on GNR.

The assumption doesn't make sense. There is no reason to limit the
devices from the same PMU to the same devfn. Also, the current code just
throws the warning, but still registers the device. The WARN_ON_ONCE()
should be removed.

The func_id is used by the later event_init() to check if a event->pmu
has valid devices. For cpu and mmio uncore PMUs, they are always valid.
For pci uncore PMUs, it's set when the PMU is registered. It can be
replaced by the pmu->registered. Clean up the func_id.

Intel-SIG: commit 3f710be02ea6 perf/x86/intel/uncore: Clean up func_id
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Eric Hu <eric.hu@intel.com>
Link: https://lkml.kernel.org/r/20250108143017.1793781-1-kan.liang@linux.intel.com
[ Aubrey Li: amend commit log ]
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 6d642735cdb6cdb814d2b6c81652caa53ce04842 upstream.

The same CXL PMONs support is also avaiable on GNR. Apply
spr_uncore_cxlcm and spr_uncore_cxldp to GNR as well.

The other units were broken on early HW samples, so they were ignored in
the early enabling patch. The issue has been fixed and verified on the
later production HW. Add UPI, B2UPI, B2HOT, PCIEX16 and PCIEX8 for GNR.

Intel-SIG: commit 6d642735cdb6 perf/x86/intel/uncore: Support more units on Granite Rapids
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Eric Hu <eric.hu@intel.com>
Link: https://lkml.kernel.org/r/20250108143017.1793781-2-kan.liang@linux.intel.com
[ Aubrey Li: amend commit log ]
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e02e9b0374c378aab016ae8ace60d9d98ab8caa6 upstream.

The counters snapshotting is a new adaptive PEBS extension, which can
capture programmable counters, fixed-function counters, and performance
metrics in a PEBS record. The feature is available in the PEBS format
V6.

The target counters can be configured in the new fields of MSR_PEBS_CFG.
Then the PEBS HW will generate the bit mask of counters (Counters Group
Header) followed by the content of all the requested counters into a
PEBS record.

The current Linux perf sample read feature can read all events in the
group when any event in the group is overflowed. But the rdpmc in the
NMI/overflow handler has a small gap from overflow. Also, there is some
overhead for each rdpmc read. The counters snapshotting feature can be
used as an accurate and low-overhead replacement.

Extend intel_update_topdown_event() to accept the value from PEBS
records.

Add a new PEBS_CNTR flag to indicate a sample read group that utilizes
the counters snapshotting feature. When the group is scheduled, the
PEBS configure can be updated accordingly.

To prevent the case that a PEBS record value might be in the past
relative to what is already in the event, perf always stops the PMU and
drains the PEBS buffer before updating the corresponding event->count.

Intel-SIG: commit e02e9b0374c3 perf/x86/intel: Support PEBS counters snapshotting
Backport CWF PMU support and dependency

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250121152303.3128733-4-kan.liang@linux.intel.com
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e415c1493fa1e93afaec697385b8952d932c41bc upstream.

Add events v1.00.

Bring in the events from:
https://github.com/intel/perfmon/tree/main/CWF/events

Intel-SIG: commit e415c1493fa1 perf vendor events: Add Clearwaterforest events
Backport CWF PMU support and dependency

Co-developed-by: Caleb Biggers <caleb.biggers@intel.com>
Signed-off-by: Caleb Biggers <caleb.biggers@intel.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250211213031.114209-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[Dapeng Mi: resolve conflict and amend commit log]
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit c53e14f1ea4a8f8ddd9b2cd850fcbc0d934b79f5 upstream.

The commit 97c79a3 ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.

It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.

The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.

Intel-SIG: commit c53e14f1ea4a perf: Extend per event callchain limit to branch stack
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 4dfe3232cc04325a09e96f6c7f9546ba6c0b132b upstream.

More and more features require a dynamic event constraint, e.g., branch
counter logging, auto counter reload, Arch PEBS, etc.

Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It
avoids keeping adding the individual flag in intel_cpuc_prepare().

Add a variable dyn_constraint in the struct hw_perf_event to track the
dynamic constraint of the event. Apply it if it's updated.

Apply the generic dynamic constraint for branch counter logging.
Many features on and after V6 require dynamic constraint. So
unconditionally set the flag for V6+.

Intel-SIG: commit 4dfe3232cc04 perf/x86: Add dynamic constraint
Backport CWF PMU support and dependency

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
Dapeng Mi and others added 24 commits February 25, 2026 23:09
commit d9cf9c6884d21e01483c4e17479d27636ea4bb50 upstream.

After the commit 'd971342d38bf ("perf/x86/intel: Decouple BTS
 initialization from PEBS initialization")' is introduced, x86_pmu.bts
would initialized in bts_init() which is hooked by arch_initcall().

Whereas init_hw_perf_events() is hooked by early_initcall(). Once the
core PMU is initialized, nmi watchdog initialization is called
immediately before bts_init() is called. It leads to the BTS buffer is
not really initialized since bts_init() is not called and x86_pmu.bts is
still false at that time. Worse, BTS buffer would never be initialized
then unless all core PMU events are freed and reserve_ds_buffers()
is called again.

Thus aligning with init_hw_perf_events(), use early_initcall() to hook
bts_init() to ensure x86_pmu.bts is initialized before nmi watchdog
initialization.

Intel-SIG: commit d9cf9c6884d2 perf/x86/intel: Use early_initcall() to hook bts_init()

Fixes: d971342d38bf ("perf/x86/intel: Decouple BTS initialization from PEBS initialization")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250820023032.17128-2-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0c5caea762de31a85cbcce65d978cec83449f699 upstream.

IA32_PERF_CAPABILITIES.PEBS_TIMING_INFO[bit 17] is introduced to
indicate whether timed PEBS is supported. Timed PEBS adds a new "retired
latency" field in basic info group to show the timing info. Please find
detailed information about timed PEBS in section 8.4.1 "Timed Processor
Event Based Sampling" of "Intel Architecture Instruction Set Extensions
and Future Features".

This patch adds PERF_CAP_PEBS_TIMING_INFO flag and KVM module leverages
this flag to expose timed PEBS feature to guest.

Moreover, opportunistically refine the indents and make the macros
share consistent indents.

Intel-SIG: commit 0c5caea762de perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Link: https://lore.kernel.org/r/20250820023032.17128-5-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 75a9001bab36f0456f6aae1ab0aa487db456464a upstream.

The perf_sample_data_init() has already set the period of sample, so no
need to do it again.

Intel-SIG: commit 75a9001bab36 perf/x86/intel/ds: Remove redundant assignments to sample.period

Signed-off-by: Changbin Du <changbin.du@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250506094907.2724-1-changbin.du@huawei.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit f226805bc5f60adf03783d8e4cbfe303ccecd64e upstream.

Check sample_type in perf_sample_save_callchain() to prevent
saving callchain data when it isn't required.

Intel-SIG: commit f226805bc5f6 perf/core: Check sample_type in perf_sample_save_callchain

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Yabin Cui <yabinc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240515193610.2350456-3-yabinc@google.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit faac6f105ef169e2e5678c14e1ffebf2a7d780b6 upstream.

Check sample_type in perf_sample_save_brstack() to prevent
saving branch stack data when it isn't required.

Intel-SIG: commit faac6f105ef1 perf/core: Check sample_type in perf_sample_save_brstack

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Yabin Cui <yabinc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240515193610.2350456-4-yabinc@google.com

 Conflicts:
	include/linux/perf_event.h
[jz: resolve simple context conflict]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 0ba6502ce167fc3d598c08c2cc3b4ed7ca5aa251 upstream.

When running "perf mem record" command on CWF, the below KASAN
global-out-of-bounds warning is seen.

  ==================================================================
  BUG: KASAN: global-out-of-bounds in cmt_latency_data+0x176/0x1b0
  Read of size 4 at addr ffffffffb721d000 by task dtlb/9850

  Call Trace:

   kasan_report+0xb8/0xf0
   cmt_latency_data+0x176/0x1b0
   setup_arch_pebs_sample_data+0xf49/0x2560
   intel_pmu_drain_arch_pebs+0x577/0xb00
   handle_pmi_common+0x6c4/0xc80

The issue is caused by below code in __grt_latency_data(). The code
tries to access x86_hybrid_pmu structure which doesn't exist on
non-hybrid platform like CWF.

        WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type == hybrid_big)

So add is_hybrid() check before calling this WARN_ON_ONCE to fix the
global-out-of-bounds access issue.

Intel-SIG: commit 0ba6502ce167 perf/x86/intel: Fix KASAN global-out-of-bounds warning

Fixes: 0902624 ("perf/x86/intel: Rename model-specific pebs_latency_data functions")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251028064214.1451968-1-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit c7f69dc073e51f1c448713320ccd2e2be63fb1f6 upstream.

2 is_x86_event() prototypes are defined in perf_event.h. Remove the
redundant one.

Intel-SIG: commit c7f69dc073e5 perf/x86: Remove redundant is_x86_event() prototype

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-2-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 7e772a93eb61cb6265bdd1c5bde17d0f2718b452 upstream.

When intel_pmu_drain_pebs_icl() is called to drain PEBS records, the
perf_event_overflow() could be called to process the last PEBS record.

While perf_event_overflow() could trigger the interrupt throttle and
stop all events of the group, like what the below call-chain shows.

perf_event_overflow()
  -> __perf_event_overflow()
    ->__perf_event_account_interrupt()
      -> perf_event_throttle_group()
        -> perf_event_throttle()
          -> event->pmu->stop()
            -> x86_pmu_stop()

The side effect of stopping the events is that all corresponding event
pointers in cpuc->events[] array are cleared to NULL.

Assume there are two PEBS events (event a and event b) in a group. When
intel_pmu_drain_pebs_icl() calls perf_event_overflow() to process the
last PEBS record of PEBS event a, interrupt throttle is triggered and
all pointers of event a and event b are cleared to NULL. Then
intel_pmu_drain_pebs_icl() tries to process the last PEBS record of
event b and encounters NULL pointer access.

To avoid this issue, move cpuc->events[] clearing from x86_pmu_stop()
to x86_pmu_del(). It's safe since cpuc->active_mask or
cpuc->pebs_enabled is always checked before access the event pointer
from cpuc->events[].

Intel-SIG: commit 7e772a93eb61 perf/x86: Fix NULL event access and potential PEBS record loss

Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-3-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit ee98b8bfc7c4baca69a6852c4ecc399794f7e53b upstream.

Use x86_pmu_drain_pebs static call to replace calling x86_pmu.drain_pebs
function pointer.

Intel-SIG: commit ee98b8bfc7c4 perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-4-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 5e4e355ae7cdeb0fef5dbe908866e1f895abfacc upstream.

current large PEBS flag check only checks if sample_regs_user contains
unsupported GPRs but doesn't check if sample_regs_intr contains
unsupported GPRs.

Of course, currently PEBS HW supports to sample all perf supported GPRs,
the missed check doesn't cause real issue. But it won't be true any more
after the subsequent patches support to sample SSP register. SSP
sampling is not supported by adaptive PEBS HW and it would be supported
until arch-PEBS HW. So correct this issue.

Intel-SIG: commit 5e4e355ae7cd perf/x86/intel: Correct large PEBS flag check

Fixes: a47ba4d ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-5-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit d243d0bb64af1e90ec18ac2fa6e7cadfe8895913 upstream.

arch-PEBS leverages CPUID.23H.4/5 sub-leaves enumerate arch-PEBS
supported capabilities and counters bitmap. This patch parses these 2
sub-leaves and initializes arch-PEBS capabilities and corresponding
structures.

Since IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs are no longer existed
for arch-PEBS, arch-PEBS doesn't need to manipulate these MSRs. Thus add
a simple pair of __intel_pmu_pebs_enable/disable() callbacks for
arch-PEBS.

Intel-SIG: commit d243d0bb64af perf/x86/intel: Initialize architectural PEBS

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-6-dapeng1.mi@linux.intel.com

 Conflicts:
	arch/x86/events/intel/ds.c
	arch/x86/events/perf_event.h
[jz: resolve context conflict, use wrmsrl(), instead of wrmsrq()]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 8807d922705f0a137d8de5f636b50e7b4fbef155 upstream.

Beside some PEBS record layout difference, arch-PEBS can share most of
PEBS record processing code with adaptive PEBS. Thus, factor out these
common processing code to independent inline functions, so they can be
reused by subsequent arch-PEBS handler.

Intel-SIG: commit 8807d922705f perf/x86/intel/ds: Factor out PEBS record processing code to functions

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-7-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 167cde7dc9b36b7a88f3c29d836fabce13023327 upstream.

Adaptive PEBS and arch-PEBS share lots of same code to process these
PEBS groups, like basic, GPR and meminfo groups. Extract these shared
code to generic functions to avoid duplicated code.

Intel-SIG: commit 167cde7dc9b3 perf/x86/intel/ds: Factor out PEBS group processing code to functions

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-8-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit d21954c8a0ffbc94ffdd65106fb6da5b59042e0a upstream.

A significant difference with adaptive PEBS is that arch-PEBS record
supports fragments which means an arch-PEBS record could be split into
several independent fragments which have its own arch-PEBS header in
each fragment.

This patch defines architectural PEBS record layout structures and add
helpers to process arch-PEBS records or fragments. Only legacy PEBS
groups like basic, GPR, XMM and LBR groups are supported in this patch,
the new added YMM/ZMM/OPMASK vector registers capturing would be
supported in the future.

Intel-SIG: commit d21954c8a0ff perf/x86/intel: Process arch-PEBS records or record fragments

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-9-dapeng1.mi@linux.intel.com
[jz: use rdmsrl/wrmsrl instead of rdmsrq/wrmsrq]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 2721e8da2de7271533ac36285332219f700d16ca upstream.

Arch-PEBS introduces a new MSR IA32_PEBS_BASE to store the arch-PEBS
buffer physical address. This patch allocates arch-PEBS buffer and then
initialize IA32_PEBS_BASE MSR with the buffer physical address.

Intel-SIG: commit 2721e8da2de7 perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-10-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit e89c5d1f290e8915e0aad10014f2241086ea95e4 upstream.

arch-PEBS provides CPUIDs to enumerate which counters support PEBS
sampling and precise distribution PEBS sampling. Thus PEBS constraints
should be dynamically configured base on these counter and precise
distribution bitmap instead of defining them statically.

Update event dyn_constraint base on PEBS event precise level.

Intel-SIG: commit e89c5d1f290e perf/x86/intel: Update dyn_constraint base on PEBS event precise level

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-11-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 52448a0a739002eca3d051a6ec314a0b178949a1 upstream.

Different with legacy PEBS, arch-PEBS provides per-counter PEBS data
configuration by programing MSR IA32_PMC_GPx/FXx_CFG_C MSRs.

This patch obtains PEBS data configuration from event attribute and then
writes the PEBS data configuration to MSR IA32_PMC_GPx/FXx_CFG_C and
enable corresponding PEBS groups.

Please notice this patch only enables XMM SIMD regs sampling for
arch-PEBS, the other SIMD regs (OPMASK/YMM/ZMM) sampling on arch-PEBS
would be supported after PMI based SIMD regs (OPMASK/YMM/ZMM) sampling
is supported.

Intel-SIG: commit 52448a0a7390 perf/x86/intel: Setup PEBS data configuration and enable legacy groups

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-12-dapeng1.mi@linux.intel.com
[jz: use rdmsrl/wrmsrl instead of rdmsrq/wrmsrq]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit bb5f13df3c455110c4468a31a5b21954268108c9 upstream.

Base on previous adaptive PEBS counter snapshot support, add counter
group support for architectural PEBS. Since arch-PEBS shares same
counter group layout with adaptive PEBS, directly reuse
__setup_pebs_counter_group() helper to process arch-PEBS counter group.

Intel-SIG: commit bb5f13df3c45 perf/x86/intel: Add counter group support for arch-PEBS

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251029102136.61364-13-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit bd24f9beed591422f45fa6d8d0d3bd3a755b8a48 upstream.

The current event scheduler has a limit. If the counter constraint of an
event is not a subset of any other counter constraint with an equal or
higher weight. The counters may not be fully utilized.

To workaround it, the commit bc1738f ("perf, x86: Fix event
scheduler for constraints with overlapping counters") introduced an
overlap flag, which is hardcoded to the event constraint that may
trigger the limit. It only works for static constraints.

Many features on and after Intel PMON v6 require dynamic constraints. An
event constraint is decided by both static and dynamic constraints at
runtime. See commit 4dfe3232cc04 ("perf/x86: Add dynamic constraint").
The dynamic constraints are from CPUID enumeration. It's impossible to
hardcode it in advance. It's not practical to set the overlap flag to all
events. It's harmful to the scheduler.

For the existing Intel platforms, the dynamic constraints don't trigger
the limit. A real fix is not required.

However, for virtualization, VMM may give a weird CPUID enumeration to a
guest. It's impossible to indicate what the weird enumeration is. A
check is introduced, which can list the possible breaks if a weird
enumeration is used.

Check the dynamic constraints enumerated for normal, branch counters
logging, and auto-counter reload.
Check both PEBS and non-PEBS constratins.

Intel-SIG: commit bd24f9beed59 perf/x86/intel: Add a check for dynamic constraints

Closes: https://lore.kernel.org/lkml/20250416195610.GC38216@noisy.programming.kicks-ass.net/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20250512175542.2000708-1-kan.liang@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 02da693f6658b9f73b97fce3695358ef3f13d0d1 upstream.

Handle the interaction between ("perf/x86/intel: Update dyn_constraint
base on PEBS event precise level") and ("perf/x86/intel: Add a check
for dynamic constraints").

Intel-SIG: commit 02da693f6658 perf/x86/intel: Check PEBS dyn_constraints

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 2093d8cf80fa5552d1025a78a8f3a10bf3b6466e upstream.

Similar to enable_acr_event, avoid the branch.

Intel-SIG: commit 2093d8cf80fa perf/x86/intel: Optimize PEBS extended config

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 9929dffce5ed7e2988e0274f4db98035508b16d9 upstream.

The following commit introduced a build failure on x86-32:

  21954c8a0ff ("perf/x86/intel: Process arch-PEBS records or record fragments")

  ...

  arch/x86/events/intel/ds.c:2983:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]

The forced type conversion to 'u64' and 'void *' are not 32-bit clean,
but they are also entirely unnecessary: ->pebs_vaddr is 'void *' already,
and integer-compatible pointer arithmetics will work just fine on it.

Fix & simplify the code.

Intel-SIG: commit 9929dffce5ed perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: d21954c8a0ff ("perf/x86/intel: Process arch-PEBS records or record fragments")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://patch.msgid.link/20251029102136.61364-10-dapeng1.mi@linux.intel.com
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 9415f749d34b926b9e4853da1462f4d941f89a0d upstream.

handle_pmi_common() may observe an active bit set in cpuc->active_mask
while the corresponding cpuc->events[] entry has already been cleared,
which leads to a NULL pointer dereference.

This can happen when interrupt throttling stops all events in a group
while PEBS processing is still in progress. perf_event_overflow() can
trigger perf_event_throttle_group(), which stops the group and clears
the cpuc->events[] entry, but the active bit may still be set when
handle_pmi_common() iterates over the events.

The following recent fix:

  7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")

moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and
relied on cpuc->active_mask/pebs_enabled checks. However,
handle_pmi_common() can still encounter a NULL cpuc->events[] entry
despite the active bit being set.

Add an explicit NULL check on the event pointer before using it,
to cover this legitimate scenario and avoid the NULL dereference crash.

Intel-SIG: commit 9415f749d34b perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()

Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Reported-by: kitta <kitta@linux.alibaba.com>
Co-developed-by: kitta <kitta@linux.alibaba.com>
Signed-off-by: Evan Li <evan.li@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
commit 369e91bd201d15a711f952ee9ac253a8b91628a3 upstream.

To pick up changes from:

  54de197c9a5e8f52 ("Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
  679fcce0028bf101 ("Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD")
  3767def18f4cc394 ("x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement")
  f6106d41ec84e552 ("x86/bugs: Use an x86 feature to track the MMIO Stale Data mitigation")
  7baadd463e147fdc ("x86/cpufeatures: Enumerate the LASS feature bits")
  47955b58cf9b97fe ("x86/cpufeatures: Correct LKGS feature flag description")
  5d0316e25defee47 ("x86/cpufeatures: Add X86_FEATURE_X2AVIC_EXT")
  6ffdb49101f02313 ("x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag")
  4793f990ea152330 ("KVM: x86: Advertise EferLmsleUnsupported to userspace")
  bb5f13df3c455110 ("perf/x86/intel: Add counter group support for arch-PEBS")
  52448a0a739002ec ("perf/x86/intel: Setup PEBS data configuration and enable legacy groups")
  d21954c8a0ffbc94 ("perf/x86/intel: Process arch-PEBS records or record fragments")
  bffeb2fd0b9c99d8 ("x86/microcode/intel: Enable staging when available")
  740144bc6bde9d44 ("x86/microcode/intel: Establish staging control logic")

This should address these tools/perf build warnings:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Please see tools/include/uapi/README.

Intel-SIG: commit 369e91bd201d tools headers: Sync x86 headers with kernel sources

Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[jz: only following 3 relevant commits are sync'ed:
	bb5f13df3c455110 ("perf/x86/intel: Add counter group support for arch-PEBS")
	52448a0a739002ec ("perf/x86/intel: Setup PEBS data configuration and enable legacy groups")
	d21954c8a0ffbc94 ("perf/x86/intel: Process arch-PEBS records or record fragments")]
Signed-off-by: Jason Zeng <jason.zeng@intel.com>
@guixiongwei
Copy link
Copy Markdown

Does LKVS already support PEBS testing? If yes, please provide a test case using LKVS.

@x56Jason
Copy link
Copy Markdown
Author

x56Jason commented Mar 9, 2026

Does LKVS already support PEBS testing? If yes, please provide a test case using LKVS.

Yes, LKVS already supports PEBS and Arch-PEBS.

Already provided test case using LKVS in Pull Request Description.

@guixiongwei
Copy link
Copy Markdown

@x56Jason can the lkvs test case cover the perf command test? if yes, we can only run lkvs.

@x56Jason
Copy link
Copy Markdown
Author

@x56Jason can the lkvs test case cover the perf command test? if yes, we can only run lkvs.

There are some minor differences between the LKVS test cases and the aforementioned dedicated perf commands. For example, the 5th command tests "--user-regs", the last test "perf test 100", they are not covered by LKVS.

But overall, the LKVS tests are quite complete, it can be used for PMU subsystem testing.

@Liangyan-bd
Copy link
Copy Markdown

commit 7e772a93eb6 ("perf/x86: Fix NULL event access and potential PEBS record loss") fixes 9734e25fbf5a ("perf: Fix the throttle logic for a group"), since in our 6.6 branch, we don't have 9734e25fbf5a ("perf: Fix the throttle logic for a group") backported, shall we skip 7e772a93eb6?

@x56Jason
Copy link
Copy Markdown
Author

commit 7e772a93eb6 ("perf/x86: Fix NULL event access and potential PEBS record loss") fixes 9734e25fbf5a ("perf: Fix the throttle logic for a group"), since in our 6.6 branch, we don't have 9734e25fbf5a ("perf: Fix the throttle logic for a group") backported, shall we skip 7e772a93eb6?

I would suggest to include it, because it is more robust than current code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.