Skip to content

2.2.8 staging prep #17325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: zfs-2.2.8-staging
Choose a base branch
from
Open

Conversation

robn
Copy link
Member

@robn robn commented May 12, 2025

Motivation and Context

Bringing various things back to 2.2. Linux kernel version update, the recent dataset hold fixes, handful of other bits.

Description

Includes:

How Has This Been Tested?

Compile checked latest release of all LTS kernels back to 4.19, and all 6.x kernels.

Also compile checked on FreeBSD 14.2.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

robn and others added 7 commits May 12, 2025 13:40
For UIO_ITER, we are just wrapping a kernel iterator. It will take care
of its own offsets if necessary. We don't need to do anything, and if we
do try to do anything with it (like advancing the iterator by the skip
in zfs_uio_advance) we're just confusing the kernel iterator, ending up
at the wrong position or worse, off the end of the memory region.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17298
(cherry picked from commit 2ee5b51)
This is a convenience for filesystems that need the inode of their
parent or their own name, as its often complicated to get that
information. We don't need those things, so this is just detecting which
prototype is expected and adjusting our callback to match.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 7ef6b70)
According to the upstream change, all callers set it, and all block
devices either honoured it or ignored it, so removing it entirely allows
a bunch of handling for the "unset" case to be removed, and it becomes
effectively implied.

We follow suit, and keep setting it for older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 2ca91ba)
since 4.10, bio->bi_opf needs to be checked to determine all kinds of
flush requests. this was the case prior to the commit referenced below,
but the order of ifdefs was not the usual one (newest up top), which
might have caused this to slip through.

this fixes a regression when using zvols as Qemu block devices, but
might have broken other use cases as well. the symptoms are that all
sync writes from within a VM configured to use such a virtual block
devices are ignored and treated as async writes by the host ZFS layer.

this can be verified using fio in sync mode inside the VM, for example
with

 fio \
 --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
 --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
 --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
 --numjobs=1 --sync=1

which shows an IOPS number way above what the physical device underneath
supports, with "zpool iostat -r 1" on the hypervisor side showing no
sync IO occuring during the benchmark.

with the regression fixed, both fio inside the VM and the IO stats on
the host show the expected numbers.

Fixes: 846b598
"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"

Signed-off-by: Fabian-Gruenbichler <[email protected]>
Co-authored-by: Alexander Motin <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Closes openzfs#17131
(cherry picked from commit 41823a0)
It turns out that approach taken in the original version of the patch
was wrong. So now, we're taking approach in-line with how kernel
actually does it - when sb is being torn down, access to it
is serialized via sb->s_umount rwsem, only when that lock is taken
is it okay to work with s_flags - and the other mistake I was doing
was trying to make SB_ACTIVE work, but apparently the kernel checks
the negative variant - not SB_DYING and not SB_BORN.

Kernels pre-6.6 don't have SB_DYING, but check if sb is hashed
instead.

Signed-off-by: Pavel Snajdr <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Closes: openzfs#17121
(cherry picked from commit a0e6271)
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
(cherry picked from commit 7b6e967)
The 6.0 kernel removes the 'migratepage' VFS op. Check for
migratepage.

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]
Closes openzfs#17217
(cherry picked from commit ab9bb19)
@robn
Copy link
Member Author

robn commented May 12, 2025

@tonyhutter I dunno if a 2.2.8 is planned (probably entirely dependent on demand). If it does happen, I'm very happy to assist with testing to get it out the door.

@tonyhutter
Copy link
Contributor

I tried this on Fedora 42 (6.14 kernel) and wasn't able to get it to build. Pulling in 155847c helped, but then I hit:

make[2]: Entering directory '/home/hutter/zfs'
  CC       module/zfs/libzpool_la-ddt.lo
module/zfs/ddt.c: In function 'ddt_stat_update':
module/zfs/ddt.c:445:9: error: array subscript -1 is below array bounds of 'ddt_stat_t[64]' {aka 'struct ddt_stat[64]'} [-Werror=array-bounds=]
  445 |         ddt_stat_add(&ddh->ddh_stat[bucket], &dds, neg);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ./include/sys/spa.h:44,
                 from module/zfs/ddt.c:29:
./include/sys/fs/zfs.h:1331:25: note: while referencing 'ddh_stat'
 1331 |         ddt_stat_t      ddh_stat[64];   /* power-of-two histogram buckets */
      |                         ^~~~~~~~
cc1: all warnings being treated as errors

@tonyhutter
Copy link
Contributor

@tonyhutter I dunno if a 2.2.8 is planned (probably entirely dependent on demand).

I didn't have any plans one way or the other, but I'd be open to a 2.2.8 release.

@amotin
Copy link
Member

amotin commented May 14, 2025

It seems #17334 would be good too. Though it is not in 2.3 yet.

@lzsaver
Copy link

lzsaver commented May 17, 2025

#17340 and #17343 look important. Both build well on top of 2.2.7.

@behlendorf
Copy link
Contributor

These are all now in master #17334, #17340 and #17343 and would be good to include in a 2.2.8 release.

@ryao
Copy link
Contributor

ryao commented May 19, 2025

We might as well include #17353 once it is merged.

@shodanshok
Copy link
Contributor

What about #17255 ?

For what it is worth, I welcome a 2.2.8 release - maybe even as the new default/stable branch for EL 9 (which is at 2.1.16 actually).

amotin and others added 3 commits May 23, 2025 22:25
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES.  But it is actually a non-evictable
anonymous memory.  Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.

Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17334
(cherry picked from commit 0aa83dc)
Bisecting identified the redacted send/receive as the source of the bug
for issue openzfs#12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <[email protected]>
Contributions-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#12014
Closes openzfs#17340
(cherry picked from commit ea74cde)
d634d20 had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.

When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#17343
(cherry picked from commit d8a33bc)
@robn robn changed the title [2.2.8] various Linux kernel updates 2.2.8 staging prep May 23, 2025
robn and others added 4 commits May 23, 2025 22:56
The intent is that the filesystem may have a reference to an "old"
version of the new directory, eg if it was keeping it alive because a
remote NFS client still had it open.

We don't need anything like that, so this really just changes things so
we return error codes encoded in pointers.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17229
(cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or
functional change apart from that, so a very minimal update for us.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17229
(cherry picked from commit 841be1d)
(cherry picked from commit 1dc9bec)
This was caught when doing a manual check to see if openzfs#17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in openzfs#17340.

Reviewed-by: George Amanakis <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#17353
(cherry picked from commit 83fa80a)
(cherry picked from commit 07d815f2a06573514f51c8601aa60db6fe1a76ad)
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes openzfs#17210
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit b1ccab1)
(cherry picked from commit 677cb91)
@robn robn force-pushed the zfs-2.2.8-staging branch from 81f2feb to a6865d7 Compare May 23, 2025 12:57
@robn
Copy link
Member Author

robn commented May 23, 2025

All suggestions included, top posted updated. Looks like we're putting a release together!

I went back through the ~600 commits on master that aren't on the 2.2 series, and there's a good amount of fixes for leaks, overflows, null derefs and so on. I didn't really want to load up 2.2 this late though, feels like we should be very sparing with on the LTS branch.

Anyway, feel free to nominate stuff I guess :)

@aerusso
Copy link
Contributor

aerusso commented May 23, 2025

Could you include 8bf1e83 please too? It fixes some ZTS failures on Debian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.