Skip to content

Commit 11e22c3

Browse files
committed
Merge remote-tracking branch 'origin/upstreams/develop' into develop
2 parents 22f6364 + 47cd938 commit 11e22c3

File tree

654 files changed

+16601
-7456
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

654 files changed

+16601
-7456
lines changed

Documentation/ABI/testing/sysfs-devices-system-cpu

+1
Original file line numberDiff line numberDiff line change
@@ -517,6 +517,7 @@ What: /sys/devices/system/cpu/vulnerabilities
517517
/sys/devices/system/cpu/vulnerabilities/mds
518518
/sys/devices/system/cpu/vulnerabilities/meltdown
519519
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
520+
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
520521
/sys/devices/system/cpu/vulnerabilities/retbleed
521522
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
522523
/sys/devices/system/cpu/vulnerabilities/spectre_v1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
====================================
4+
File system Monitoring with fanotify
5+
====================================
6+
7+
File system Error Reporting
8+
===========================
9+
10+
Fanotify supports the FAN_FS_ERROR event type for file system-wide error
11+
reporting. It is meant to be used by file system health monitoring
12+
daemons, which listen for these events and take actions (notify
13+
sysadmin, start recovery) when a file system problem is detected.
14+
15+
By design, a FAN_FS_ERROR notification exposes sufficient information
16+
for a monitoring tool to know a problem in the file system has happened.
17+
It doesn't necessarily provide a user space application with semantics
18+
to verify an IO operation was successfully executed. That is out of
19+
scope for this feature. Instead, it is only meant as a framework for
20+
early file system problem detection and reporting recovery tools.
21+
22+
When a file system operation fails, it is common for dozens of kernel
23+
errors to cascade after the initial failure, hiding the original failure
24+
log, which is usually the most useful debug data to troubleshoot the
25+
problem. For this reason, FAN_FS_ERROR tries to report only the first
26+
error that occurred for a file system since the last notification, and
27+
it simply counts additional errors. This ensures that the most
28+
important pieces of information are never lost.
29+
30+
FAN_FS_ERROR requires the fanotify group to be setup with the
31+
FAN_REPORT_FID flag.
32+
33+
At the time of this writing, the only file system that emits FAN_FS_ERROR
34+
notifications is Ext4.
35+
36+
A FAN_FS_ERROR Notification has the following format::
37+
38+
[ Notification Metadata (Mandatory) ]
39+
[ Generic Error Record (Mandatory) ]
40+
[ FID record (Mandatory) ]
41+
42+
The order of records is not guaranteed, and new records might be added
43+
in the future. Therefore, applications must not rely on the order and
44+
must be prepared to skip over unknown records. Please refer to
45+
``samples/fanotify/fs-monitor.c`` for an example parser.
46+
47+
Generic error record
48+
--------------------
49+
50+
The generic error record provides enough information for a file system
51+
agnostic tool to learn about a problem in the file system, without
52+
providing any additional details about the problem. This record is
53+
identified by ``struct fanotify_event_info_header.info_type`` being set
54+
to FAN_EVENT_INFO_TYPE_ERROR.
55+
56+
struct fanotify_event_info_error {
57+
struct fanotify_event_info_header hdr;
58+
__s32 error;
59+
__u32 error_count;
60+
};
61+
62+
The `error` field identifies the type of error using errno values.
63+
`error_count` tracks the number of errors that occurred and were
64+
suppressed to preserve the original error information, since the last
65+
notification.
66+
67+
FID record
68+
----------
69+
70+
The FID record can be used to uniquely identify the inode that triggered
71+
the error through the combination of fsid and file handle. A file system
72+
specific application can use that information to attempt a recovery
73+
procedure. Errors that are not related to an inode are reported with an
74+
empty file handle of type FILEID_INVALID.

Documentation/admin-guide/hw-vuln/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
2121
cross-thread-rsb.rst
2222
gather_data_sampling.rst
2323
srso
24+
reg-file-data-sampling
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
==================================
2+
Register File Data Sampling (RFDS)
3+
==================================
4+
5+
Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
6+
only affects Intel Atom parts(also branded as E-cores). RFDS may allow
7+
a malicious actor to infer data values previously used in floating point
8+
registers, vector registers, or integer registers. RFDS does not provide the
9+
ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.
10+
11+
Affected Processors
12+
===================
13+
Below is the list of affected Intel processors [#f1]_:
14+
15+
=================== ============
16+
Common name Family_Model
17+
=================== ============
18+
ATOM_GOLDMONT 06_5CH
19+
ATOM_GOLDMONT_D 06_5FH
20+
ATOM_GOLDMONT_PLUS 06_7AH
21+
ATOM_TREMONT_D 06_86H
22+
ATOM_TREMONT 06_96H
23+
ALDERLAKE 06_97H
24+
ALDERLAKE_L 06_9AH
25+
ATOM_TREMONT_L 06_9CH
26+
RAPTORLAKE 06_B7H
27+
RAPTORLAKE_P 06_BAH
28+
ALDERLAKE_N 06_BEH
29+
RAPTORLAKE_S 06_BFH
30+
=================== ============
31+
32+
As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
33+
RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
34+
vulnerable in Linux because they share the same family/model with an affected
35+
part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
36+
CPUID.HYBRID. This information could be used to distinguish between the
37+
affected and unaffected parts, but it is deemed not worth adding complexity as
38+
the reporting is fixed automatically when these parts enumerate RFDS_NO.
39+
40+
Mitigation
41+
==========
42+
Intel released a microcode update that enables software to clear sensitive
43+
information using the VERW instruction. Like MDS, RFDS deploys the same
44+
mitigation strategy to force the CPU to clear the affected buffers before an
45+
attacker can extract the secrets. This is achieved by using the otherwise
46+
unused and obsolete VERW instruction in combination with a microcode update.
47+
The microcode clears the affected CPU buffers when the VERW instruction is
48+
executed.
49+
50+
Mitigation points
51+
-----------------
52+
VERW is executed by the kernel before returning to user space, and by KVM
53+
before VMentry. None of the affected cores support SMT, so VERW is not required
54+
at C-state transitions.
55+
56+
New bits in IA32_ARCH_CAPABILITIES
57+
----------------------------------
58+
Newer processors and microcode update on existing affected processors added new
59+
bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
60+
vulnerability and mitigation capability:
61+
62+
- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
63+
- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
64+
microcode that clears the affected buffers on VERW execution.
65+
66+
Mitigation control on the kernel command line
67+
---------------------------------------------
68+
The kernel command line allows to control RFDS mitigation at boot time with the
69+
parameter "reg_file_data_sampling=". The valid arguments are:
70+
71+
========== =================================================================
72+
on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
73+
on exit to userspace and before entering a VM.
74+
off Disables mitigation.
75+
========== =================================================================
76+
77+
Mitigation default is selected by CONFIG_MITIGATION_RFDS.
78+
79+
Mitigation status information
80+
-----------------------------
81+
The Linux kernel provides a sysfs interface to enumerate the current
82+
vulnerability status of the system: whether the system is vulnerable, and
83+
which mitigations are active. The relevant sysfs file is:
84+
85+
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
86+
87+
The possible values in this file are:
88+
89+
.. list-table::
90+
91+
* - 'Not affected'
92+
- The processor is not vulnerable
93+
* - 'Vulnerable'
94+
- The processor is vulnerable, but no mitigation enabled
95+
* - 'Vulnerable: No microcode'
96+
- The processor is vulnerable but microcode is not updated.
97+
* - 'Mitigation: Clear Register File'
98+
- The processor is vulnerable and the CPU buffer clearing mitigation is
99+
enabled.
100+
101+
References
102+
----------
103+
.. [#f1] Affected Processors
104+
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html

Documentation/admin-guide/hw-vuln/spectre.rst

+19-18
Original file line numberDiff line numberDiff line change
@@ -439,12 +439,12 @@ The possible values in this file are:
439439
- System is protected by retpoline
440440
* - BHI: BHI_DIS_S
441441
- System is protected by BHI_DIS_S
442-
* - BHI: SW loop; KVM SW loop
442+
* - BHI: SW loop, KVM SW loop
443443
- System is protected by software clearing sequence
444-
* - BHI: Syscall hardening
445-
- Syscalls are hardened against BHI
446-
* - BHI: Syscall hardening; KVM: SW loop
447-
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
444+
* - BHI: Vulnerable
445+
- System is vulnerable to BHI
446+
* - BHI: Vulnerable, KVM: SW loop
447+
- System is vulnerable; KVM is protected by software clearing sequence
448448

449449
Full mitigation might require a microcode update from the CPU
450450
vendor. When the necessary microcode is not available, the kernel will
@@ -506,8 +506,12 @@ Spectre variant 2
506506
between modes. Systems which support BHI_DIS_S will set it to protect against
507507
BHI attacks.
508508

509-
Legacy IBRS systems clear the IBRS bit on exit to userspace and
510-
therefore explicitly enable STIBP for that
509+
On Intel's enhanced IBRS systems, this includes cross-thread branch target
510+
injections on SMT systems (STIBP). In other words, Intel eIBRS enables
511+
STIBP, too.
512+
513+
AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
514+
the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.
511515

512516
The retpoline mitigation is turned on by default on vulnerable
513517
CPUs. It can be forced on or off by the administrator
@@ -641,9 +645,10 @@ kernel command line.
641645
retpoline,generic Retpolines
642646
retpoline,lfence LFENCE; indirect branch
643647
retpoline,amd alias for retpoline,lfence
644-
eibrs enhanced IBRS
645-
eibrs,retpoline enhanced IBRS + Retpolines
646-
eibrs,lfence enhanced IBRS + LFENCE
648+
eibrs Enhanced/Auto IBRS
649+
eibrs,retpoline Enhanced/Auto IBRS + Retpolines
650+
eibrs,lfence Enhanced/Auto IBRS + LFENCE
651+
ibrs use IBRS to protect kernel
647652

648653
Not specifying this option is equivalent to
649654
spectre_v2=auto.
@@ -706,18 +711,14 @@ For user space mitigation:
706711
spectre_bhi=
707712

708713
[X86] Control mitigation of Branch History Injection
709-
(BHI) vulnerability. Syscalls are hardened against BHI
710-
regardless of this setting. This setting affects the deployment
714+
(BHI) vulnerability. This setting affects the deployment
711715
of the HW BHI control and the SW BHB clearing sequence.
712716

713717
on
714-
unconditionally enable.
718+
(default) Enable the HW or SW mitigation as
719+
needed.
715720
off
716-
unconditionally disable.
717-
auto
718-
enable if hardware mitigation
719-
control(BHI_DIS_S) is available, otherwise
720-
enable alternate mitigation in KVM.
721+
Disable the mitigation.
721722

722723
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
723724

Documentation/admin-guide/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
8282
edid
8383
efi-stub
8484
ext4
85+
filesystem-monitoring
8586
nfs/index
8687
gpio/index
8788
highuid

Documentation/admin-guide/kernel-parameters.txt

+29-10
Original file line numberDiff line numberDiff line change
@@ -1053,6 +1053,26 @@
10531053
The filter can be disabled or changed to another
10541054
driver later using sysfs.
10551055

1056+
reg_file_data_sampling=
1057+
[X86] Controls mitigation for Register File Data
1058+
Sampling (RFDS) vulnerability. RFDS is a CPU
1059+
vulnerability which may allow userspace to infer
1060+
kernel data values previously stored in floating point
1061+
registers, vector registers, or integer registers.
1062+
RFDS only affects Intel Atom processors.
1063+
1064+
on: Turns ON the mitigation.
1065+
off: Turns OFF the mitigation.
1066+
1067+
This parameter overrides the compile time default set
1068+
by CONFIG_MITIGATION_RFDS. Mitigation cannot be
1069+
disabled when other VERW based mitigations (like MDS)
1070+
are enabled. In order to disable RFDS mitigation all
1071+
VERW based mitigations need to be disabled.
1072+
1073+
For details see:
1074+
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
1075+
10561076
driver_async_probe= [KNL]
10571077
List of driver names to be probed asynchronously.
10581078
Format: <driver_name1>,<driver_name2>...
@@ -3087,8 +3107,10 @@
30873107
nospectre_bhb [ARM64]
30883108
nospectre_v1 [X86,PPC]
30893109
nospectre_v2 [X86,PPC,S390,ARM64]
3110+
reg_file_data_sampling=off [X86]
30903111
retbleed=off [X86]
30913112
spec_store_bypass_disable=off [X86,PPC]
3113+
spectre_bhi=off [X86]
30923114
spectre_v2_user=off [X86]
30933115
srbds=off [X86,INTEL]
30943116
ssbd=force-off [ARM64]
@@ -5417,16 +5439,13 @@
54175439
See Documentation/admin-guide/laptops/sonypi.rst
54185440

54195441
spectre_bhi= [X86] Control mitigation of Branch History Injection
5420-
(BHI) vulnerability. Syscalls are hardened against BHI
5421-
reglardless of this setting. This setting affects the
5442+
(BHI) vulnerability. This setting affects the
54225443
deployment of the HW BHI control and the SW BHB
54235444
clearing sequence.
54245445

5425-
on - unconditionally enable.
5426-
off - unconditionally disable.
5427-
auto - (default) enable hardware mitigation
5428-
(BHI_DIS_S) if available, otherwise enable
5429-
alternate mitigation in KVM.
5446+
on - (default) Enable the HW or SW mitigation
5447+
as needed.
5448+
off - Disable the mitigation.
54305449

54315450
spectre_v2= [X86] Control mitigation of Spectre variant 2
54325451
(indirect branch speculation) vulnerability.
@@ -5458,9 +5477,9 @@
54585477
retpoline,generic - Retpolines
54595478
retpoline,lfence - LFENCE; indirect branch
54605479
retpoline,amd - alias for retpoline,lfence
5461-
eibrs - enhanced IBRS
5462-
eibrs,retpoline - enhanced IBRS + Retpolines
5463-
eibrs,lfence - enhanced IBRS + LFENCE
5480+
eibrs - Enhanced/Auto IBRS
5481+
eibrs,retpoline - Enhanced/Auto IBRS + Retpolines
5482+
eibrs,lfence - Enhanced/Auto IBRS + LFENCE
54645483
ibrs - use IBRS to protect kernel
54655484

54665485
Not specifying this option is equivalent to

Documentation/core-api/dma-api.rst

+14
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,20 @@ Returns the maximum size of a mapping for the device. The size parameter
204204
of the mapping functions like dma_map_single(), dma_map_page() and
205205
others should not be larger than the returned value.
206206

207+
::
208+
209+
size_t
210+
dma_opt_mapping_size(struct device *dev);
211+
212+
Returns the maximum optimal size of a mapping for the device.
213+
214+
Mapping larger buffers may take much longer in certain scenarios. In
215+
addition, for high-rate short-lived streaming mappings, the upfront time
216+
spent on the mapping may account for an appreciable part of the total
217+
request lifetime. As such, if splitting larger requests incurs no
218+
significant performance penalty, then device drivers are advised to
219+
limit total DMA streaming mappings length to the returned value.
220+
207221
::
208222

209223
bool

Documentation/filesystems/locking.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -442,17 +442,21 @@ prototypes::
442442
void (*lm_break)(struct file_lock *); /* break_lease callback */
443443
int (*lm_change)(struct file_lock **, int);
444444
bool (*lm_breaker_owns_lease)(struct file_lock *);
445+
bool (*lm_lock_expirable)(struct file_lock *);
446+
void (*lm_expire_lock)(void);
445447

446448
locking rules:
447449

448450
====================== ============= ================= =========
449-
ops inode->i_lock blocked_lock_lock may block
451+
ops flc_lock blocked_lock_lock may block
450452
====================== ============= ================= =========
451-
lm_notify: yes yes no
453+
lm_notify: no yes no
452454
lm_grant: no no no
453455
lm_break: yes no no
454456
lm_change yes no no
455-
lm_breaker_owns_lease: no no no
457+
lm_breaker_owns_lease: yes no no
458+
lm_lock_expirable yes no no
459+
lm_expire_lock no no yes
456460
====================== ============= ================= =========
457461

458462
buffer_head

0 commit comments

Comments
 (0)