AMD Venice MCA patches for VeLinux (6.6 kernel)#110
Open
mohanasv2 wants to merge 52 commits intoopenvelinux:6.6-velinuxfrom
Open
AMD Venice MCA patches for VeLinux (6.6 kernel)#110mohanasv2 wants to merge 52 commits intoopenvelinux:6.6-velinuxfrom
mohanasv2 wants to merge 52 commits intoopenvelinux:6.6-velinuxfrom
Conversation
commit 4c113a5b28bfd589e2010b5fc8867578b0135ed7 upstream. Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set. The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface. Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-1-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 3ed57b4 upstream. When a "storm" of corrected machine check interrupts (CMCI) is detected this code mitigates by disabling CMCI interrupt signalling from all of the banks owned by the CPU that saw the storm. There are problems with this approach: 1) It is very coarse grained. In all likelihood only one of the banks was generating the interrupts, but CMCI is disabled for all. This means Linux may delay seeing and processing errors logged from other banks. 2) Although CMCI stands for Corrected Machine Check Interrupt, it is also used to signal when an uncorrected error is logged. This is a problem because these errors should be handled in a timely manner. Delete all this code in preparation for a finer grained solution. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20231115195450.12963-2-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 7eae17c upstream. This is the core functionality to track CMCI storms at the machine check bank granularity. Subsequent patches will add the vendor specific hooks to supply input to the storm detection and take actions on the start/end of a storm. machine_check_poll() is called both by the CMCI interrupt code, and for periodic polls from a timer. Add a hook in this routine to maintain a bitmap history for each bank showing whether the bank logged an corrected error or not each time it is polled. In normal operation the interval between polls of these banks determines how far to shift the history. The 64 bit width corresponds to about one second. When a storm is observed a CPU vendor specific action is taken to reduce or stop CMCI from the bank that is the source of the storm. The bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. During a storm each bit in the history indicates the status of the bank each time it is polled. Thus the history covers just over a minute. Declare a storm for that bank if the number of corrected interrupts seen in that history is above some threshold (defined as 5 in this series, could be tuned later if there is data to suggest a better value). A storm on a bank ends if enough consecutive polls of the bank show no corrected errors (defined as 30, may also change). That calls the CPU vendor specific function to revert to normal operational mode, and changes the polling rate back to the default. [ bp: Massage. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231115195450.12963-3-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 00c092de6f28ebd32208aef83b02d61af2229b60 upstream. Users can disable MCA polling by setting the "ignore_ce" parameter or by setting "check_interval=0". This tells the kernel to *not* start the MCE timer on a CPU. If the user did not disable CMCI, then storms can occur. When these happen, the MCE timer will be started with a fixed interval. After the storm subsides, the timer's next interval is set to check_interval. This disregards the user's input through "ignore_ce" and "check_interval". Furthermore, if "check_interval=0", then the new timer will run faster than expected. Create a new helper to check these conditions and use it when a CMCI storm ends. [ bp: Massage. ] Fixes: 7eae17c ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-2-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit d66e1e90b16055d2f0ee76e5384e3f119c3c2773 upstream. Ensure that sysfs init doesn't fail for new/unrecognized bank types or if a bank has additional blocks available. Most MCA banks have a single thresholding block, so the block takes the same name as the bank. Unified Memory Controllers (UMCs) are a special case where there are two blocks and each has a unique name. However, the microarchitecture allows for five blocks. Any new MCA bank types with more than one block will be missing names for the extra blocks. The MCE sysfs will fail to initialize in this case. Fixes: 87a6d40 ("x86/mce/AMD: Update sysfs bank names for SMCA systems") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-3-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 5f6e3b720694ad771911f637a51930f511427ce1 upstream. The MCA threshold limit must be reset after servicing the interrupt. Currently, the restart function doesn't have an explicit check for this. It makes some assumptions based on the current limit and what's in the registers. These assumptions don't always hold, so the limit won't be reset in some cases. Make the reset condition explicit. Either an interrupt/overflow has occurred or the bank is being initialized. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-4-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 9af8b441cf6953f683b825fbf241a979ea7521e8 upstream. It operates per block rather than per bank. So rename it for clarity. No functional changes. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-5-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
…vice() commit 4d2161b9e8ba64076f520ec2f00eefb00722c15e upstream. The return values are not checked, so set return type to 'void'. Also, move function declarations to internal.h, since these functions are only used within the MCE subsystem. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-6-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit b249288abde5190bb113ea5acef8af4ceac4957c upstream. The MCx_MISC0[BlkPtr] field was used on legacy systems to hold a register offset for the next MCx_MISC* register. In this way, an implementation-specific number of registers can be discovered at runtime. The MCAX/SMCA register space simplifies this by always including the MCx_MISC[1-4] registers. The MCx_MISC0[BlkPtr] field is used to indicate (true/false) whether any MCx_MISC[1-4] registers are present. Currently, MCx_MISC0[BlkPtr] is checked early and cached to be used during sysfs init later. This is unnecessary as the MCx_MISC0 register is read again later anyway. Remove the smca_banks_map variable as it is effectively redundant, and use a direct register/bit check instead. [ bp: Zap smca_get_block_address() too. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-3-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit d35fb3121a36170bba951c529847a630440e4174 upstream. Legacy AMD systems include an integrated Northbridge that is represented by MCA bank 4. This is the only non-core MCA bank in legacy systems. The Northbridge is physically shared by all the CPUs within an AMD "Node". However, in practice the "shared" MCA bank can only by managed by a single CPU within that AMD Node. This is known as the "Node Base Core" (NBC). For example, only the NBC will be able to read the MCA bank 4 registers; they will be Read-as-Zero for other CPUs. Also, the MCA Thresholding interrupt will only signal the NBC; the other CPUs will not receive it. This is enforced by hardware, and it should not be managed by software. The current AMD Thresholding code attempts to deal with the "shared" MCA bank by micromanaging the bank's sysfs kobjects. However, this does not follow the intended kobject use cases. It is also fragile, and it has caused bugs in the past. Modern AMD systems do not need this shared MCA bank support, and it should not be needed on legacy systems either. Remove the shared threshold bank code. Also, move the threshold struct definitions to mce/amd.c, since they are no longer needed in amd_nb.c. [Backport Changes] 1. In arch/x86/include/asm/amd_nb.h, the upstream patch removes the refcount.h include, but this header is already removed in the current source tree. Therefore, the removal step was skipped since the expected change is already reflected in the existing code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241206161210.163701-2-yazen.ghannam@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit c4bac5c640e3782bf30c07c4d82042d0202fe224 upstream. The threshold_bank structure is a container for one or more threshold_block structures. Currently, the container has a single pointer to the 'first' threshold_block structure which then has a linked list of the remaining threshold_block structures. This results in an extra level of indirection where the 'first' block is checked before iterating over the remaining blocks. Remove the indirection by including the head of the block list in the threshold_bank structure which already acts as a container for all the bank's thresholding blocks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-8-236dd74f645f@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 9f34032ec0deef58bd0eb7475f1981adfa998648 upstream. The __mcheck_cpu_init_early() function was introduced so that some vendor-specific features are detected before the first MCA polling event done in __mcheck_cpu_init_generic(). Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and additional code will be needed to support various system configurations. However, the current and future vendor-specific code should be done during vendor init. This keeps all the vendor code in a common location and simplifies the generic init flow. Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-6-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit cfffcf97997bd35f4a59e035523d1762568bdbad upstream. Set the CR4.MCE bit as the last step during init. This brings the MCA init order closer to what is described in the x86 docs. x86 docs: AMD Intel MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL MCG_CTL CR4.MCE CR4.MCE Current Linux: AMD Intel CR4.MCE CR4.MCE MCG_CTL MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL Updated Linux: AMD Intel MCG_CTL MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL CR4.MCE CR4.MCE The new init flow will match Intel's docs, but there will still be a mismatch for AMD regarding MCG_CTL. However, there is no known issue with this ordering, so leave it for now. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 669ce4984b729ad5b4c6249d4a8721ae52398bfb upstream. Currently, MCA initialization is executed identically on each CPU as they are brought online. However, a number of MCA initialization tasks only need to be done once. Define a function to collect all 'global' init tasks and call this from the BSP only. Start with CPU features. [Backport Changes] 1. In file arch/x86/kernel/cpu/mce/core.c, within the newly added function mca_bsp_init(), the call to rdmsrq() was replaced with the existing equivalent call rdmsrl() because the upstream commit c435e608cf59f that globally renamed rdmsrl() to rdmsrq() is not available yet in the current source tree. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit c6e465b8d45a1bc717d196ee769ee5a9060de8e2 upstream. Currently, on AMD systems, MCA interrupt handler functions are set during CPU init. However, the functions only need to be set once for the whole system. Assign the handlers only during BSP init. Do so only for SMCA systems to maintain the old behavior for legacy systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 495a91d upstream. Define helper functions for legacy and SMCA systems in order to reuse individual checks in later changes. Describe what each function is checking for, and correct the XEC bitmask for SMCA. No functional change intended. [ bp: Use "else in amd_mce_is_memory_error() to make the conditional balanced, for readability. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613141142.36801-2-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 48da1ad upstream. Currently, all valid MCA_ADDR values are assumed to be usable on AMD systems. However, this is not correct in most cases. Notifiers expecting usable addresses may then operate on inappropriate values. Define a helper function to do AMD-specific checks for a usable memory address. List out all known cases. [ bp: Tone down the capitalized words. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-3-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 1bae0cf upstream. Move Intel-specific checks into a helper function. Explicitly use "bool" for return type. No functional change intended. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-4-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 612905e upstream. mce_device_create() is called only from mce_cpu_online() which in turn will be called iff MCA support is available. That is, at the time of mce_device_create() call it's guaranteed that MCA support is available. No need to duplicate this check so remove it. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231107165529.407349-1-nik.borisov@suse.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 108c649 upstream. Systems with a large number of CPUs may generate a large number of machine check records when things go seriously wrong. But Linux has a fixed-size buffer that can only capture a few dozen errors. Allocate space based on the number of CPUs (with a minimum value based on the historical fixed buffer that could store 80 records). [ bp: Rename local var from tmpp to something more telling: gpool. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Avadhut Naik <avadhut.naik@amd.com> Link: https://lore.kernel.org/r/20240307192704.37213-1-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit ac5e80e upstream. - Only capitalize entries where that makes sense - Print separate values separately - Rename 'PROCESSOR' to vendor & CPUID Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Avadhut Naik <avadhut.naik@amd.com> Cc: "Tony Luck" <tony.luck@intel.com> Link: https://lore.kernel.org/r/ZgZpn/zbCJWYdL5y@gmail.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 9843064 upstream. Machine Check Error information from 'struct mce' is exposed to userspace through the mce_record tracepoint. Currently, however, the PPIN (Protected Processor Inventory Number) field of 'struct mce' is not exposed. Add a PPIN field to the tracepoint as it provides a unique identifier for the system (or socket in case of multi-socket systems) on which the MCE has been received. Also, add a comment explaining the kind of information that can be and should be added to the tracepoint. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240401171455.1737976-2-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 186d7ef upstream. Currently, the microcode field (Microcode Revision) of 'struct mce' is not exposed to userspace through the mce_record tracepoint. Knowing the microcode version on which the MCE was received is critical information for debugging. If the version is not recorded, later attempts to acquire the version might result in discrepancies since it can be changed at runtime. Add microcode version to the tracepoint to prevent ambiguity over the active version on the system when the MCE was received. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240401171455.1737976-3-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 4a5f2dd upstream. New CPU #defines encode vendor and family as well as model. [ bp: Squash *three* mce patches into one, fold in fix: https://lore.kernel.org/r/20240429022051.63360-1-tony.luck@intel.com ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181511.41772-1-tony.luck%40intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 5b9d292 upstream. The recent CMCI storm handling rework removed the last case that checks the return value of machine_check_poll(). Therefore the "error_seen" variable is no longer used, so remove it. Fixes: 3ed57b4 ("x86/mce: Remove old CMCI storm mitigation code") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240523155641.2805411-3-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 5ad21a2 upstream. There is no MCE "setup" done in mce_setup(). Rather, this function initializes and prepares an MCE record. Rename the function to highlight what it does. No functional change is intended. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-2-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit f9bbb8a upstream. Generally, MCA information for an error is gathered on the CPU that reported the error. In this case, CPU-specific information from the running CPU will be correct. However, this will be incorrect if the MCA information is gathered while running on a CPU that didn't report the error. One example is creating an MCA record using mce_prep_record() for errors reported from ACPI. Split mce_prep_record() so that there is a helper function to gather common, i.e. not CPU-specific, information and another helper for CPU-specific information. Leave mce_prep_record() defined as-is for the common case when running on the reporting CPU. Get MCG_CAP in the global helper even though the register is per-CPU. This value is not already cached per-CPU like other values. And it does not assist with any per-CPU decoding or handling. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-3-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 793aa4b upstream. Current AMD systems can report MCA errors using the ACPI Boot Error Record Table (BERT). The BERT entries for MCA errors will be an x86 Common Platform Error Record (CPER) with an MSR register context that matches the MCAX/SMCA register space. However, the BERT will not necessarily be processed on the CPU that reported the MCA errors. Therefore, the correct CPU number needs to be determined and the information saved in struct mce. Use the newly defined mce_prep_record_*() helpers to get the correct data. Also, add an explicit check to verify that a valid CPU number was found from the APIC ID search. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-4-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 750fd23926f1507cc826b5a4fdd4bfc7283e7723 upstream. Currently, exporting new additional machine check error information involves adding new fields for the same at the end of the struct mce. This additional information can then be consumed through mcelog or tracepoint. However, as new MSRs are being added (and will be added in the future) by CPU vendors on their newer CPUs with additional machine check error information to be exported, the size of struct mce will balloon on some CPUs, unnecessarily, since those fields are vendor-specific. Moreover, different CPU vendors may export the additional information in varying sizes. The problem particularly intensifies since struct mce is exposed to userspace as part of UAPI. It's bloating through vendor-specific data should be avoided to limit the information being sent out to userspace. Add a new structure mce_hw_err to wrap the existing struct mce. The same will prevent its ballooning since vendor-specifc data, if any, can now be exported through a union within the wrapper structure and through __dynamic_array in mce_record tracepoint. Furthermore, new internal kernel fields can be added to the wrapper struct without impacting the user space API. [ bp: Restore reverse x-mas tree order of function vars declarations. ] [Backport Changes] 1. In arch/x86/kernel/cpu/mce/core.c, within the function mce_panic() deviations are shown due to line number changes.This is because the declaration of struct page *p was removed from the top of the function and moved inside the if condition (if (final && (final->status & MCI_STATUS_ADDRV))) in upstream merge commit b4442ca. Backporting that commit would introduce additional dependencies. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit c845cb8dbd2e1a804babfd13648026c3a7cfbc0b upstream. Make several functions that return 0 or 1 return a boolean value for better readability. No functional changes are intended. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-2-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit c46945c9cac8437a674edb9d8fbe71511fb4acee upstream. Make those functions whose callers only care about success or failure return a boolean value for better readability. Also, update the call sites accordingly as the polarities of all the return values have been flipped. No functional changes. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-4-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 51a12c28bb9a043e9444db5bd214b00ec161a639 upstream. Split each vendor specific part into its own helper function. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241212140103.66964-5-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit a46b2bbe1e36e7faab5010f68324b7d191c5c09f upstream. The 'UNKNOWN' vendor check is handled as a quirk that is run on each online CPU. However, all CPUs are expected to have the same vendor. Move the 'UNKNOWN' vendor check to the BSP-only init so it is done early and once. Remove the unnecessary return value from the quirks check. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 0f134c53246366c00664b640f9edc9be5db255b3 upstream. Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename that function to what it does now - prepares banks. Do this so that generic and vendor banks init goes first so that settings done during that init can take effect before the first bank polling takes place. Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks() as it already loops over the banks. The MCP_DONTLOG flag is no longer needed, since the MCA polling function is now called only if boot-time logging should be done. [Backport Changes] 1. In file arch/x86/kernel/cpu/mce/core.c, within the function __mcheck_cpu_check_banks(), the call to wrmsrq() was replaced with the existing equivalent call wrmsrl() because the upstream commit 78255eb239733 that globally renamed wrmsrl() to wrmsrq() is not available yet in the current source tree. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-5-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
…mbolic IFM references commit fd82221 upstream. There's an erratum that prevents the PAT from working correctly: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf # Document 316515 Version 010 The kernel currently disables PAT support on those CPUs, but it does it with some magic numbers. Replace the magic numbers with the new "IFM" macros. Make the check refer to the last affected CPU (INTEL_CORE_YONAH) rather than the first fixed one. This makes it easier to find the documentation of the erratum since Intel documents where it is broken and not where it is fixed. I don't think the Pentium Pro (or Pentium II) is actually affected. But the old check included them, so it can't hurt to keep doing the same. I'm also not completely sure about the "Pentium M" CPUs (models 0x9 and 0xd). But, again, they were included in in the old checks and were close Pentium III derivatives, so are likely affected. While we're at it, revise the comment referring to the erratum name and making sure it is a quote of the language from the actual errata doc. That should make it easier to find in the future when the URL inevitably changes. Why bother with this in the first place? It actually gets rid of one of the very few remaining direct references to c->x86{,_model}. No change in functionality intended. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Len Brown <len.brown@intel.com> Link: https://lore.kernel.org/r/20240829220042.1007820-1-dave.hansen@linux.intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 359d7a98e3e3f88dbf45411427b284bb3bbbaea5 upstream.
Convert family/model mixed checks to VFM-based checks to make the code
more compact. Simplify.
[ bp: Drop the "what" from the commit message - it should be visible from
the diff alone. ]
Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-6-qiuxu.zhuo@intel.com
Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com>
Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 7eee1e92684507f64ec6a75fecbd27e37174b888 upstream. Many quirks are global configuration settings and a handful apply to each CPU. Move the per-CPU quirks to vendor init to execute them on each online CPU. Set the global quirks during BSP-only init so they're only executed once and early. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 91af6842e9945d064401ed2d6e91539a619760d1 upstream. There are a number of generic and vendor-specific status checks in machine_check_poll(). These are used to determine if an error should be skipped. Move these into helper functions. Future vendor-specific checks will be added to the helpers. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit d4fca1358ea9096f2f6ed942e2cb3a820073dfc1 upstream. Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within these registers is considered valid if MCA_STATUS[SyndV] is set. Userspace error decoding tools like rasdaemon gather related hardware error information through the tracepoints. Therefore, export these two registers through the mce_record tracepoint so that tools like rasdaemon can parse them and output the supplemental error information like FRU text contained in them. [ bp: Massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit ebe29309c4d2821d5fdccd5393eba9c77540e260 upstream. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 8e44e83f57c3289a41507eb79a315400629978ae upstream. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 5c6f123c419b6e20f84ac1683089a52f449273aa upstream. Add a helper at the end of the MCA polling function to collect vendor and/or feature actions. Start with a basic skeleton for now. Actions for AMD thresholding and deferred errors will be added later. [ bp: Drop the obvious comment too. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 34da4a5d6814ca4cd0116144e37433bf55cf0189 upstream. AMD systems optionally support an MCA thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA thresholding is managed using the MCA_MISC registers within an MCA bank. The OS will need to modify the hardware error count field in order to reset the threshold limit and rearm the interrupt. Management of the MCA_MISC register should be done as a follow up to the basic MCA polling flow. It should not be the main focus of the interrupt handler. Furthermore, future systems will have the ability to send an MCA thresholding interrupt to the OS even when the OS does not manage the feature, i.e. MCA_MISC registers are Read-as-Zero/Locked. Call the common MCA polling function when handling the MCA thresholding interrupt. This will allow the OS to find any valid errors whether or not the MCA thresholding feature is OS-managed. Also, this allows the common MCA polling options and kernel parameters to apply to AMD systems. Add a callback to the MCA polling function to check and reset any threshold blocks that have reached their threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 7cb735d7c0cb4307b2072aae6268b5b2069a8658 upstream.
AMD systems optionally support a deferred error interrupt. The interrupt
should be used as another signal to trigger MCA polling. This is similar to
how other MCA interrupts are handled.
Deferred errors do not require any special handling related to the interrupt,
e.g. resetting or rearming the interrupt, etc.
However, Scalable MCA systems include a pair of registers, MCA_DESTAT and
MCA_DEADDR, that should be checked for valid errors. This check should be done
whenever MCA registers are polled. Currently, the deferred error interrupt
does this check, but the MCA polling function does not.
Call the MCA polling function when handling the deferred error interrupt. This
keeps all "polling" cases in a common function.
Add an SMCA status check helper. This will do the same status check and
register clearing that the interrupt handler has done. And it extends the
common polling flow to find AMD deferred errors.
Clear the MCA_DESTAT register at the end of the handler rather than the
beginning. This maintains the procedure that the 'status' register must be
cleared as the final step.
[Backport Changes]
1. Since the commit 78255eb239733 ("Rename 'wrmsrl()' to 'wrmsrq()'"),
which renamed wrmsrl() to wrmsrq() globally, and commit c435e608cf59f
("Rename 'rdmsrl()' to 'rdmsrq()'"), which renamed rdmsrl() to rdmsrq()
globally, are not available in the current source tree and backporting it
would introduce large, unrelated changes. Therefore, the changes intended
for wrmsrq() and rdmsrq() are instead applied to functionally equivalent
wrmsrl() and rdmsrl() calls in this backport.
Accordingly, in file arch/x86/kernel/cpu/mce/core.c, wrmsrl() and
rdmsrl() calls are removed and added as needed to follow the upstream
changes while preserving the original behavior in the following functions:
1. __log_error
2. _log_error_bank()
3. _log_error_deferred()
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com>
Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 134b1eabe6d9df8873bd018c9465994db8bff945 upstream. Scalable MCA systems have a per-CPU register that gives the APIC LVT offset for the thresholding and deferred error interrupts. Currently, this register is read once to set up the deferred error interrupt and then read again for each thresholding block. Furthermore, the APIC LVT registers are configured each time, but they only need to be configured once per-CPU. Move the APIC LVT setup to the early part of CPU init, so that the registers are set up once. Also, this ensures that the kernel is ready to service the interrupts before the individual error sources (each MCA bank) are enabled. Apply this change only to SMCA systems to avoid breaking any legacy behavior. The deferred error interrupt is technically advertised by the SUCCOR feature. However, this was first made available on SMCA systems. Therefore, only set up the deferred error interrupt on SMCA systems and simplify the code. Guidance from hardware designers is that the LVT offsets provided from the platform should be used. The kernel should not try to enforce specific values. However, the kernel should check that an LVT offset is not reused for multiple sources. Therefore, remove the extra checking and value enforcement from the MCE code. The "reuse/conflict" case is already handled in setup_APIC_eilvt(). [Backport Changes] 1. In file arch/x86/kernel/cpu/mce/amd.c, within the newly added function smca_enable_interrupt_vectors(), the call to rdmsrq_safe() was replaced with the existing equivalent call rdmsrl_safe() because the upstream commit 6fe22abacd40e2 that globally renamed rdmsrl_safe() to rdmsrq_safe() is not available yet in the current source tree and backporting it would introduce large, unrelated changes. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 4efaec6e16c249b64d389c85c3ef01345580483a upstream. AMD systems optionally support MCA thresholding which provides the ability for hardware to send an interrupt when a set error threshold is reached. This feature counts errors of all severities, but it is commonly used to report correctable errors with an interrupt rather than polling. Scalable MCA systems allow the platform to take control of this feature. In this case, the OS will not see the feature configuration and control bits in the MCA_MISC* registers. The OS will not receive the MCA thresholding interrupt, and it will need to poll for correctable errors. A "corrected error interrupt" will be available on Scalable MCA systems. This will be used in the same configuration where the platform controls MCA thresholding. However, the platform will now be able to send the MCA thresholding interrupt to the OS. Check for, and enable, this feature during per-CPU SMCA init. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 3206b41604f80bb84c19ae3ed7c01d9d671ece2a upstream. Many of the checks in reset_block() are done again in the block reset function. So drop the redundant checks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 56f17be67a332d146821d1a812ab16388d07ace7 upstream. Prepare for CMCI storm support by moving the common bank/block iterator code to a helper function. Include a parameter to switch the interrupt enable. This will be used by the CMCI storm handling function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit eeb3f76d73baed4c8ecc883e1eaafba3cb8aae1d upstream. The MCA threshold limit generally is not something that needs to change during runtime. It is common for a system administrator to decide on a policy for their managed systems. If MCA thresholding is OS-managed, then the threshold limit must be set at every boot. However, many systems allow the user to set a value in their BIOS. And this is reported through an APEI HEST entry even if thresholding is not in FW-First mode. Use this value, if available, to set the OS-managed threshold limit. Users can still override it through sysfs if desired for testing or debug. APEI is parsed after MCE is initialized. So reset the thresholding blocks later to pick up the threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 821f5fe4dbcb82357f0dfcfca3dd98f2b4097e53 upstream. Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System Physical Address (SPA) is present in MCA_ADDR. PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it indicates validity of SPA in MCA_ADDR. PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if it is implementation specific. Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
…ystems commit d7ac083f095d894a0b8ac0573516bfd035e6b25a upstream. Currently, when a CMCI storm detected on a Machine Check bank, subsides, the bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared unconditionally by cmci_storm_end(). On AMD SMCA systems, this essentially disables polling on that particular bank on that CPU. Consequently, any subsequent correctable errors or storms will not be logged. Since AMD SMCA systems allow banks to be managed by both polling and interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should not be modified when a storm subsides. Fixes: 7eae17c ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>
commit 5c4663ed1eac01987a1421f059380db48ab7b1a3 upstream.
Extend the logic of handling CMCI storms to AMD threshold interrupts.
Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and
per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on
a storm. Rather, disable the interrupt on the corresponding CPU and bank.
Re-enable back the interrupts if enough consecutive polls of the bank show no
corrected errors (30, as programmed by Intel).
Turning off the threshold interrupts would be a better solution on AMD systems
as other error severities will still be handled even if the threshold
interrupts are disabled.
[ Tony: Small tweak because mce_handle_storm() isn't a pointer now ]
[ Yazen: Rebase and simplify ]
[ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix
checkpatch warnings. ]
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com
Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com>
Signed-off-by: mohanasv2 <mohanasv@amd.com>
98db8d5 to
32db07c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AMD Venice MCA patches for VeLinux (6.6 kernel)
This patch series contains 52 commits that modernize and refactor x86 Machine Check Exception (MCE) handling across Intel and AMD platforms. The changes improve error reporting, tracepoint exposure, helper abstractions, vendor-specific quirk handling, and initialization robustness, while aligning with upstream kernel conventions in naming, readability, and sysfs infrastructure.
Key Bug Fixes
- Bank initialization cleanup: Unified bank preparation into __mcheck_cpu_init_prepare_banks(), removing redundant flags and ensuring vendor settings apply before polling.
- Return type consistency: Converted multiple functions to return bool instead of 0/1 for clarity and correctness.
- Redundant checks removed: Eliminated unnecessary MCA support checks in mce_device_create().
- Timeout/error handling: Simplified machine_check_poll() by removing unused variables and return values after CMCI storm rework.
- MCE/APEI bug fixes.
Feature Additions
- New AMD registers: Added support for MCA_SYND1 and MCA_SYND2 on Zen4 systems, exporting supplemental error info (e.g., FRU text).
- Tracepoint extensions:
- Added ::microcode field to record active microcode revision.
- Added ::ppin field to expose Protected Processor Inventory Number.
- Cleaned up TP_printk() output for better readability.
- Wrapper struct: Introduced mce_hw_err to encapsulate struct mce, preventing UAPI bloat and enabling vendor-specific extensions.
- Dynamic buffer sizing: Allocated machine check record space based on CPU count, scaling beyond the historical fixed buffer.
- SMCA feature enablement.
Logic & Performance Improvements
- Helper abstractions:
- Split mce_prep_record() into common and per-CPU helpers.
- Defined clear_bank() and status-check helpers for vendor-specific actions.
- Split amd_mce_is_memory_error() into legacy and SMCA-specific helpers.
- Quirk handling:
- Separated global vs per-CPU quirks.
- Moved “UNKNOWN vendor” check to BSP-only init.
- Broke up __mcheck_cpu_apply_quirks() into vendor-specific helpers.
- Naming consistency:
- Renamed mce_setup() → mce_prep_record().
- Renamed MSR accessors mce_wrmsrl() → mce_wrmsrq() and mce_rdmsrl() → mce_rdmsrq().
- Intel errata handling: Replaced magic family/model numbers with symbolic IFM macros for PAT erratum checks.
- Unifies AMD THR & DFR handling with MCA polling
Robustness & Safety
- Initialization resilience:
- BSP-only SMCA init ensures handlers are set once per system.
- Vendor-specific quirks applied consistently across CPUs.
- Error address usability:
- Defined amd_mce_usable_address() for AMD-specific validation.
- Cleaned up mce_usable_address() with Intel-specific helpers.
- Strengthens robustness against threshold interrupt storms.
Unit Test:
NOTE: The test cases listed below will work only after both Phase 1 and Phase 2 MCA patches are merged. Therefore, these test cases should be executed only after all Venice MCA patches have been integrated.
Without patches:
dmesg | grep -i mce
[ 0.000000] Linux version 6.6.95-base-mce+ (amd@host) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) [Intel-SIG] backporting KVM: x86: Advertise AVX10.1 CPUID to userspace #59 SMP PREEMPT_DYNAMIC Mon Jan 12 16:22:36 IST 2026
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.024602] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.843764] BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+
[ 1.009275] usb usb1: Manufacturer: Linux 6.6.95-base-mce+ ehci_hcd
[ 1.011461] usb usb2: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.011585] usb usb3: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.012162] usb usb4: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.012250] usb usb5: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.013054] usb usb6: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.013129] usb usb7: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 2.213820] MCE: In-kernel MCE decoding enabled.
With patch:
dmesg | grep -i mce
[ 0.000000] Linux version 6.6.95-mce-full+ (amd@host) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) Ve5.15 brbe #57 SMP PREEMPT_DYNAMIC Mon Jan 12 15:16:20 IST 2026
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.024368] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.533450] mce: HEST corrected error threshold limit: 10
[ 0.844504] BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+
[ 1.009294] usb usb1: Manufacturer: Linux 6.6.95-mce-full+ ehci_hcd
[ 1.011542] usb usb2: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.011663] usb usb3: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.012220] usb usb4: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.012309] usb usb5: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.013122] usb usb6: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.013202] usb usb7: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 2.066804] MCE: In-kernel MCE decoding enabled.