-
Notifications
You must be signed in to change notification settings - Fork 12
[LTS 9.2] CVE-2024-25742, CVE-2024-25743, CVE-2024-25744 #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pvts-mat
wants to merge
13
commits into
ctrliq:ciqlts9_2
Choose a base branch
from
pvts-mat:ciqlts9_2-CVE-batch-7
base: ciqlts9_2
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+305
−115
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jira VULN-772 cve-pre CVE-2024-25744 commit-author Lukas Bulwahn <[email protected]> commit 6bf8a55 Fix misspelled Kconfig symbols as detected by scripts/checkkconfigsymbols.py. [ bp: Combine into a single patch. ] Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] (cherry picked from commit 6bf8a55) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Nikolay Borisov <[email protected]> commit 1da5c9b IA32 support on 64bit kernels depends on whether CONFIG_IA32_EMULATION is selected or not. As it is a compile time option it doesn't provide the flexibility to have distributions set their own policy for IA32 support and give the user the flexibility to override it. As a first step introduce ia32_enabled() which abstracts whether IA32 compat is turned on or off. Upcoming patches will implement the ability to set IA32 compat state at boot time. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 1da5c9b) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Nikolay Borisov <[email protected]> commit f71e1d2 The SYSCALL instruction cannot really be disabled in compatibility mode. The best that can be done is to configure the CSTAR msr to point to a minimal handler. Currently this handler has a rather misleading name - ignore_sysret() as it's not really doing anything with sysret. Give it a more descriptive name. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit f71e1d2) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Nikolay Borisov <[email protected]> commit 370dcd5 To limit the IA32 exposure on 64bit kernels while keeping the flexibility for the user to enable it when required, the compile time enable/disable via CONFIG_IA32_EMULATION is not good enough and will be complemented with a kernel command line option. Right now entry_SYSCALL32_ignore() is only compiled when CONFIG_IA32_EMULATION=n, but boot-time enable- / disablement obviously requires it to be unconditionally available. Remove the #ifndef CONFIG_IA32_EMULATION guard. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 370dcd5) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Nikolay Borisov <[email protected]> commit 6138228 upstream-diff Upstream code between the #ifdef / #else / #endif in `arch/x86/kernel/cpu/common.c' differs slightly from `ciqlts9_2' (`wrmsrl_cstar' function used instead of `wrmsrl'). Applied the same logic of #ifdef / #else -> if / else conversion to the existing codebase. Another major aspect of supporting running of 32bit processes is the ability to access 32bit syscalls. Such syscalls can be invoked by using the legacy int 0x80 handler and sysenter/syscall instructions. If IA32 emulation is disabled ensure that each of those 3 distinct mechanisms are also disabled. For int 0x80 a #GP exception would be generated since the respective descriptor is not going to be loaded at all. Invoking sysenter will also result in a #GP since IA32_SYSENTER_CS contains an invalid segment. Finally, syscall instruction cannot really be disabled so it's configured to execute a minimal handler. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 6138228) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Nikolay Borisov <[email protected]> commit a11e097 Distributions would like to reduce their attack surface as much as possible but at the same time they'd want to retain flexibility to cater to a variety of legacy software. This stems from the conjecture that compat layer is likely rarely tested and could have latent security bugs. Ideally distributions will set their default policy and also give users the ability to override it as appropriate. To enable this use case, introduce CONFIG_IA32_EMULATION_DEFAULT_DISABLED compile time option, which controls whether 32bit processes/syscalls should be allowed or not. This option is aimed mainly at distributions to set their preferred default behavior in their kernels. To allow users to override the distro's policy, introduce the 'ia32_emulation' parameter which allows overriding CONFIG_IA32_EMULATION_DEFAULT_DISABLED state at boot time. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit a11e097) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve-pre CVE-2024-25744 commit-author Vitaly Kuznetsov <[email protected]> commit d55f31e ia32_emulation_override_cmdline() is an early_param() arg and these are only needed at boot time. In fact, all other early_param() functions in arch/x86 seem to have '__init' annotation and ia32_emulation_override_cmdline() is the only exception. Fixes: a11e097 ("x86: Make IA32_EMULATION boot time configurable") Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: https://lore.kernel.org/all/20241210151650.1746022-1-vkuznets%40redhat.com (cherry picked from commit d55f31e) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve CVE-2024-25744 commit-author Kirill A. Shutemov <[email protected]> commit b82a8db The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The kernel expects to receive a software interrupt as a result of the INT 0x80 instruction. However, an external interrupt on the same vector triggers the same handler. The kernel interprets an external interrupt on vector 0x80 as a 32-bit system call that came from userspace. A VMM can inject external interrupts on any arbitrary vector at any time. This remains true even for TDX and SEV guests where the VMM is untrusted. Put together, this allows an untrusted VMM to trigger int80 syscall handling at any given point. The content of the guest register file at that moment defines what syscall is triggered and its arguments. It opens the guest OS to manipulation from the VMM side. Disable 32-bit emulation by default for TDX and SEV. User can override it with the ia32_emulation=y command line option. [ dhansen: reword the changelog ] Reported-by: Supraja Sridhara <[email protected]> Reported-by: Benedict Schlüter <[email protected]> Reported-by: Mark Kuhne <[email protected]> Reported-by: Andrin Bertschi <[email protected]> Reported-by: Shweta Shinde <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+: 1da5c9b x86: Introduce ia32_enabled() Cc: <[email protected]> # v6.0+
jira VULN-772 cve CVE-2024-25744 commit-author Thomas Gleixner <[email protected]> commit be5341e There is no real reason to have a separate ASM entry point implementation for the legacy INT 0x80 syscall emulation on 64-bit. IDTENTRY provides all the functionality needed with the only difference that it does not: - save the syscall number (AX) into pt_regs::orig_ax - set pt_regs::ax to -ENOSYS Both can be done safely in the C code of an IDTENTRY before invoking any of the syscall related functions which depend on this convention. Aside of ASM code reduction this prepares for detecting and handling a local APIC injected vector 0x80. [ kirill.shutemov: More verbose comments ] Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+ (cherry picked from commit be5341e) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve CVE-2024-25744 commit-author Thomas Gleixner <[email protected]> commit 55617fb The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The kernel expects to receive a software interrupt as a result of the INT 0x80 instruction. However, an external interrupt on the same vector also triggers the same codepath. An external interrupt on vector 0x80 will currently be interpreted as a 32-bit system call, and assuming that it was a user context. Panic on external interrupts on the vector. To distinguish software interrupts from external ones, the kernel checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+ (cherry picked from commit 55617fb) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-772 cve CVE-2024-25744 commit-author Kirill A. Shutemov <[email protected]> commit f4116bf 32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an interrupt on vector 0x80. Now that int80_emulation() has a check for external interrupts the limitation can be lifted. To distinguish software interrupts from external ones, int80_emulation() checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0. On TDX, the VAPIC state (including ISR) is protected and cannot be manipulated by the VMM. The ISR bit is set by the microcode flow during the handling of posted interrupts. [ dhansen: more changelog tweaks ] Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> # v6.0+ (cherry picked from commit f4116bf) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-756 cve CVE-2024-25742 commit-author Borislav Petkov (AMD) <[email protected]> commit e3ef461 upstream-diff Added `#else' case for the `#ifndef __BOOT_COMPRESSED' which was modified in upstream but not present in `ciqlts9_2'. Compare the opcode bytes at rIP for each #VC exit reason to verify the instruction which raised the #VC exception is actually the right one. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected]
jira VULN-756 cve-bf CVE-2024-25742 commit-author Tom Lendacky <[email protected]> commit e70316d The MWAITX and MONITORX instructions generate the same #VC error code as the MWAIT and MONITOR instructions, respectively. Update the #VC handler opcode checking to also support the MWAITX and MONITORX opcodes. Fixes: e3ef461 ("x86/sev: Harden #VC instruction emulation somewhat") Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/453d5a7cfb4b9fe818b6fb67f93ae25468bc9e23.1713793161.git.thomas.lendacky@amd.com (cherry picked from commit e70316d) Signed-off-by: Marcin Wcisło <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[LTS 9.2]
CVE-2024-25742 VULN-756
CVE-2024-25743 VULN-764
CVE-2024-25744 VULN-772
Background
Cves CVE-2024-25742, CVE-2024-25743, CVE-2024-25744 are all associated with the same work of ETH Zurich researchers. The attack target is a Linux virtual machine and the attacker is its own hypervisor. Of course, in the classical setting the hypervisor has full control over all its guests and there is no need for any "attack". The scenario applies to VMs which have their memory encrypted and not available to the hypervisor to read or (unnoticably) modify. Such VMs are called isolated or confidential (CVM) and the hypervisor untrusted. It's an important use case in the modern cloud computing environment, where the virtual machines are often run on a third party physical machines.
Confidentiality of VM's memory from the hypervisor can only be achieved with hardware support, as the guest needs to keep the (necessarily) unencrypted key to its encrypted memory outside of hypervisor-controlled memory itself. This is called Trusted Execution Environment (TEE). The technologies relevant to the CVEs are AMD SEV-SNP (Secure Encrypted Virtualization - Secure Nested Paging) and Intel TDX (Trusted Domain Extensions). They serve roughly the same function of encrypting entire VM's memory.
Although the hypervisor cannot read VM's memory it still needs to have some means of controlling it, among others - raising interrupts. The attack leverages hypervisor's ability to arbitrarily issue interrupts on its guest to obtain unauthorized access. Different CVE numbers are related to different interrupt families. Refer to the following table for the summary:
Misconceptions
Applicability
LTS 9.2 supports both Intel TDX and AMD SEV-SNP, so the problem applies to full extent:
kernel-src-tree/configs/kernel-x86_64-rhel.config
Line 2667 in b4f997e
kernel-src-tree/configs/kernel-x86_64-rhel.config
Line 2356 in b4f997e
The IA32 emulation is enabled, which exposes LTS 9.2 to CVE-2024-25744:
kernel-src-tree/configs/kernel-x86_64-rhel.config
Line 2109 in b4f997e
Solution
The solution is provided for CVE-2024-25744 and CVE-2024-25742.
ia32_enabled
kernel command line parameter, in contrast to having only the compile time optionCONFIG_IA32_EMULATION
. This allows for CVE-2024-25744 mitigation in b82a8db.ia32_enabled
parameter being introduced. From the perspective of patching CVE-2024-25744 this is fine, because the vulnerability is not related to running 32 bit processes.ciqlts9_2
.The commits grouping summary:
kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Reference
kselftests–ciqlts9_2–run1.log
kselftests–ciqlts9_2–run2.log
Patch
kselftests–ciqlts9_2-CVE-batch-7–run1.log
kselftests–ciqlts9_2-CVE-batch-7–run2.log
Comparison
The tests results for the reference and the patch are the same.
Specific tests: passed
The cloud image was modified with the help of
qemu-nbd
to check whether kernel recognized the new boot argumentia32_emulation
:boot-test-ia32_emulation=0.log
The key boot log fragment:
Compare with the clearly unsupported boot argument
ia32_emulationxx
:boot-test-ia32_emulationxx=0.log
This proves that the patched kernel properly recognizes the new parameter
ia32_emulation
.The functional tests were not carried out due to the lack appropriate testing infrastructure (this would need to be tested on a bare Intel or AMD machine with the appropriate TEE support).
Footnotes
1 https://nvd.nist.gov/vuln/detail/CVE-2024-25742
2 https://nvd.nist.gov/vuln/detail/CVE-2024-25743
3 https://nvd.nist.gov/vuln/detail/CVE-2024-25744
4 https://ahoi-attacks.github.io/wesee/
5 https://ahoi-attacks.github.io/heckler/
6 https://arxiv.org/html/2404.03526v1
7 https://ahoi-attacks.github.io/heckler/
8 https://ahoi-attacks.github.io/heckler/
9 https://ahoi-attacks.github.io/wesee/
10 https://ahoi-attacks.github.io/heckler/
11 https://ahoi-attacks.github.io/heckler/
12 https://www.intel.com/content/www/us/en/security-center/announcement/intel-security-announcement-2024-04-08-001.html