Skip to content

Conversation

bmastbergen
Copy link
Collaborator

https://ciqinc.atlassian.net/browse/LE-4335

When testing an lts-9.4 content release in a vm using the kernel-kselftest-internal kselftest runner, the tests will hang at the following point:

...
...
...
# [SKIP]
# running: ./gup_test
# -----------------------------------------
# running ./gup_test -ct -F 0x1 0 19 0x1000
# -----------------------------------------
# check if CONFIG_GUP_TEST is enabled in kernel config
# [SKIP]
# running: ./userfaultfd
# --------------------------------
# running ./userfaultfd anon 20 16
# --------------------------------
# nr_pages: 5120, nr_pages_per_cpu: 320
# bounces: 15, mode: rnd racing ver poll,

At the same time the following warning can be seen in the vms dmesg:

[ 4821.197689] WARNING: CPU: 0 PID: 48482 at include/linux/swapops.h:426 change_pte_range+0x48e/0x930
[ 4821.200535] Modules linked in: mpls_iptunnel mpls_router vrf sit act_gact cls_flower echainiv geneve ip6_gre ip6_tunnel ip_gre gre netdevsim psample ib_core xfrm_interface xfrm6_tunnel tunnel6 esp4 bonding vxlan ip6_udp_tunnel udp_tunnel ext4 mbcache jbd2 loop sch_fq 8021q garp mrp stp llc dummy tun ipip tunnel4 ip_tunnel veth cls_bpf sch_ingress tls rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core
intel_vsec pmt_telemetry pmt_class kvm_intel kvm bochs drm_vram_helper drm_ttm_helper irqbypass ttm rapl drm_kms_helper syscopyarea sysfillrect joydev pcspkr sysimgblt fb_sys_fops
i2c_piix4 drm fuse xfs libcrc32c sr_mod cdrom sg ata_generic virtio_net crct10dif_pclmul net_failover crc32_pclmul ata_piix libata crc32c_intel failover virtio_blk ghash_clmulni_intel serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: test_bpf]
[ 4821.204381] CPU: 0 PID: 48482 Comm: userfaultfd Kdump: loaded Tainted: G           OE
   -------  ---  5.14.0-427.42.1.el9_4.94ciq_lts.10.1.x86_64 #1
[ 4821.205078] Hardware name: Red Hat KVM, BIOS 1.16.3-4.el9 04/01/2014
[ 4821.205412] RIP: 0010:change_pte_range+0x48e/0x930
[ 4821.205670] Code: c1 fe ff ff e9 9a fe ff ff 48 81 c5 00 10 00 00 4d 85 c9 74 97 41 f6
42 51 10 74 90 49 83 ba 90 00 00 00 00 0f 84 37 04 00 00 <0f> 0b 48 b8 00 fe ff ff ff ff ff 07 49 83 c0 01 48 89 03 e9 6a ff
[ 4821.206628] RSP: 0018:ffffa2738065ba68 EFLAGS: 00010202
[ 4821.206906] RAX: 0000000000000001 RBX: ffff8a25078aea00 RCX: ffffa2738065bc38
[ 4821.207280] RDX: 0000000000000000 RSI: fff0000000000fff RDI: ffff8a253a9eb878
[ 4821.207650] RBP: 00007f741f141000 R08: 0000000000000140 R09: 0000000000000004
[ 4821.208026] R10: ffff8a253a9eb878 R11: 000fffffffffffff R12: 0000000000000000
[ 4821.208399] R13: 800000010c5c7425 R14: 00007f741f140000 R15: 00007f741f200000
[ 4821.208770] FS:  00007f741cffd640(0000) GS:ffff8a282fc00000(0000) knlGS:0000000000000000
[ 4821.209194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4821.209495] CR2: 00007f742034a000 CR3: 0000000104c40000 CR4: 0000000000750ef0
[ 4821.209878] PKRU: 55555554
[ 4821.210027] Call Trace:
[ 4821.210164]  <TASK>
[ 4821.210283]  ? show_trace_log_lvl+0x1c4/0x2df
[ 4821.210516]  ? show_trace_log_lvl+0x1c4/0x2df
[ 4821.210748]  ? change_pmd_range.isra.0+0x18d/0x3f0
[ 4821.211002]  ? change_pte_range+0x48e/0x930
[ 4821.211221]  ? __warn+0x81/0x110
[ 4821.211385]  ? change_pte_range+0x48e/0x930
[ 4821.211591]  ? report_bug+0x10a/0x140
[ 4821.211775]  ? handle_bug+0x3c/0x70
[ 4821.211951]  ? exc_invalid_op+0x14/0x70
[ 4821.212157]  ? asm_exc_invalid_op+0x16/0x20
[ 4821.212380]  ? change_pte_range+0x48e/0x930
[ 4821.212602]  ? change_pte_range+0x8e1/0x930
[ 4821.212824]  change_pmd_range.isra.0+0x18d/0x3f0

The site of the WARN looks like this in ciqlts9_4


#else /* CONFIG_PTE_MARKER */

static inline swp_entry_t make_pte_marker_entry(pte_marker marker)
{
	/* This should never be called if !CONFIG_PTE_MARKER */
	WARN_ON_ONCE(1);
	return swp_entry(0, 0);
}

CONFIG_PTE_MARKER is not set in the ciqlts9_4 config and so, according to the comment, we should never be calling this function. This led me to look at the commits that have been made to swapops.h hoping that I could see how we got into this situation. That led me to the following commit:

804153f

mm: use pte markers for swap errors
JIRA: https://issues.redhat.com/browse/RHEL-1349
Upstream Status: v6.2-rc1

commit https://github.com/ctrliq/kernel-src-tree/commit/15520a3f046998e3f57e695743e99b0875e2dae7
Author:     Peter Xu <[email protected]>
AuthorDate: Sun Oct 30 17:41:51 2022 -0400
Commit:     Andrew Morton <[email protected]>
CommitDate: Wed Nov 30 15:58:46 2022 -0800

    PTE markers are ideal mechanism for things like SWP_SWAPIN_ERROR.  Using a
    whole swap entry type for this purpose can be an overkill, especially if
    we already have PTE markers.  Define a new bit for swapin error and
    replace it with pte markers.  Then we can safely drop SWP_SWAPIN_ERROR and
    give one device slot back to swap.

    We used to have SWP_SWAPIN_ERROR taking the page pfn as part of the swap
    entry, but it's never used.  Neither do I see how it can be useful because
    normally the swapin failure should not be caused by a bad page but bad
    swap device.  Drop it alongside.

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Peter Xu <[email protected]>
    Reviewed-by: Huang Ying <[email protected]>
    Reviewed-by: Miaohe Lin <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Naoya Horiguchi <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Mika Penttilä <[email protected]>

This commit adds a caller of make_pte_caller_entry that is outside of the CONFIG_PTE_MARKER check. It turns out that this commit was made to the upstream kernel as the second of a two commit series, the first of which changes the kernel to always compile in pte markers:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/swapops.h?id=ca92ea3dc5a2b01f98e9f02b7a6bc03be06fe124

mm: always compile in pte markers
Patch series "mm: Use pte marker for swapin errors".

This series uses the pte marker to replace the swapin error swap entry,
then we save one more swap entry slot for swap devices.  A new pte marker
bit is defined.


This patch (of 2):

The PTE markers code is tiny and now it's enabled for most of the
distributions.  It's fine to keep it as-is, but to make a broader use of
it (e.g.  replacing read error swap entry) it needs to be there always
otherwise we need special code path to take care of !PTE_MARKER case.

It'll be easier just make pte marker always exist.  Use this chance to
extend its usage to anonymous too by simply touching up some of the old
comments, because it'll be used for anonymous pages in the follow up
patches.

This should have been backported at the same time as the mm: use pte markers for swap errors commit. If I add this commit to ciqlts9_4 and run the userfaultfd test again the warning is not seen and the test does not hang.

jira LE-4335
commit-author Peter Xu <[email protected]>
commit ca92ea3

Patch series "mm: Use pte marker for swapin errors".

This series uses the pte marker to replace the swapin error swap entry,
then we save one more swap entry slot for swap devices.  A new pte marker
bit is defined.

This patch (of 2):

The PTE markers code is tiny and now it's enabled for most of the
distributions.  It's fine to keep it as-is, but to make a broader use of
it (e.g.  replacing read error swap entry) it needs to be there always
otherwise we need special code path to take care of !PTE_MARKER case.

It'll be easier just make pte marker always exist.  Use this chance to
extend its usage to anonymous too by simply touching up some of the old
comments, because it'll be used for anonymous pages in the follow up
patches.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
	Signed-off-by: Peter Xu <[email protected]>
	Reviewed-by: Huang Ying <[email protected]>
	Reviewed-by: Miaohe Lin <[email protected]>
	Acked-by: David Hildenbrand <[email protected]>
	Cc: Andrea Arcangeli <[email protected]>
	Cc: Naoya Horiguchi <[email protected]>
	Cc: Peter Xu <[email protected]>
	Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit ca92ea3)
	Signed-off-by: Brett Mastbergen <[email protected]>
@bmastbergen bmastbergen requested a review from a team October 10, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant