Skip to content

PANIC: zfs: adding existent segment to range tree #15030

Open
@reefland

Description

@reefland

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.1
Kernel Version 5.15.0-75-generic
Architecture x86_64
OpenZFS Version 2.1.5

Describe the problem you're observing

Randomly (not at boot or pool import) system shows panic messages (see below) on the console. The system seemed to be operational. Upon trying to reboot I was unable to get a clean shutdown. Had several Failed unmounting messages with references to /root and /var/log. Along with a failed to start Journal Service. Eventually it hung at a systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID ... Had to kill the power after waiting ~20 minutes.

Upon reboot, everything seems fine. No issues with pool import, zpool status was good. After an hour or so I would see the first panic message. Usually within 24 to 48 hours, had to reboot again.. it would not be a clean shutdown, required manual power off.

The PANIC messages I have is very close to #13483 however, that discussion appears to be on pool import issues. I'm not having any issues with pool import so I'm documenting a new issue as the panic seems to have a different cause.

Include any warning/errors/backtraces from the system logs

Jul 02 16:09:43 k3s06 kernel: PANIC: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 16:09:43 k3s06 kernel: Showing stack for process 1111
Jul 02 16:09:43 k3s06 kernel: CPU: 10 PID: 1111 Comm: txg_sync Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 16:09:43 k3s06 kernel: Hardware name: ASRock B660M Steel Legend/B660M Steel Legend, BIOS 10.02 02/10/2023
Jul 02 16:09:43 k3s06 kernel: Call Trace:
Jul 02 16:09:43 k3s06 kernel:  <TASK>
Jul 02 16:09:43 k3s06 kernel:  show_stack+0x52/0x5c
Jul 02 16:09:43 k3s06 kernel:  dump_stack_lvl+0x4a/0x63
Jul 02 16:09:43 k3s06 kernel:  dump_stack+0x10/0x16
Jul 02 16:09:43 k3s06 kernel:  spl_dumpstack+0x29/0x2f [spl]
Jul 02 16:09:43 k3s06 kernel:  vcmn_err.cold+0x60/0x78 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 16:09:43 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:09:43 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 16:09:43 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 16:09:43 k3s06 kernel:  kthread+0x127/0x150
Jul 02 16:09:43 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 16:09:43 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 16:09:43 k3s06 kernel:  </TASK>


Jul 02 16:13:19 k3s06 kernel: INFO: task txg_sync:1111 blocked for more than 120 seconds.
Jul 02 16:13:19 k3s06 kernel:       Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 16:13:19 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 02 16:13:19 k3s06 kernel: task:txg_sync        state:D stack:    0 pid: 1111 ppid:     2 flags:0x00004004
Jul 02 16:13:19 k3s06 kernel: Call Trace:
Jul 02 16:13:19 k3s06 kernel:  <TASK>
Jul 02 16:13:19 k3s06 kernel:  __schedule+0x24e/0x590
Jul 02 16:13:19 k3s06 kernel:  schedule+0x69/0x110
Jul 02 16:13:19 k3s06 kernel:  vcmn_err.cold+0x76/0x78 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 16:13:19 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:13:19 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 16:13:19 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 16:13:19 k3s06 kernel:  kthread+0x127/0x150
Jul 02 16:13:19 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 16:13:19 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 16:13:19 k3s06 kernel:  </TASK>

NOTE: System is using ZFS Boot Menu, so this is ZFS on Root, but does not require the commonly used separate BOOT and ROOT pools. This method allows for a single ZFS pool (with encryption).

After reboot, status is good:

$ zpool status
  pool: k3s06
 state: ONLINE
  scan: scrub repaired 0B in 08:41:02 with 0 errors on Sun Jun 11 09:05:03 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        k3s06                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA109YVK-part3  ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA1040NK-part3  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VGJJ0Y5G-part3     ONLINE       0     0     0
            ata-WDC_WD120EFBX-68B0EN0_5QJ6U62B-part3    ONLINE       0     0     0
        cache
          nvme0n1                                       ONLINE       0     0     0

errors: No known data errors

I tried to issue a sudo zpool scrub k3s06 and it hanged (did not return to prompt). I open another SSH session, zpool status does not indicate a scrub is in progress. Checked the console and it had panic message scrolling on the screen.

Jul 02 19:45:26 k3s06 kernel: PANIC: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 19:45:26 k3s06 kernel: Showing stack for process 1341
Jul 02 19:45:26 k3s06 kernel: CPU: 2 PID: 1341 Comm: txg_sync Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 19:45:26 k3s06 kernel: Hardware name: ASRock B660M Steel Legend/B660M Steel Legend, BIOS 10.02 02/10/2023
Jul 02 19:45:26 k3s06 kernel: Call Trace:
Jul 02 19:45:26 k3s06 kernel:  <TASK>
Jul 02 19:45:26 k3s06 kernel:  show_stack+0x52/0x5c
Jul 02 19:45:26 k3s06 kernel:  dump_stack_lvl+0x4a/0x63
Jul 02 19:45:26 k3s06 kernel:  dump_stack+0x10/0x16
Jul 02 19:45:26 k3s06 kernel:  spl_dumpstack+0x29/0x2f [spl]
Jul 02 19:45:26 k3s06 kernel:  vcmn_err.cold+0x60/0x78 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 19:45:26 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? kmem_cache_free+0x24f/0x290
Jul 02 19:45:26 k3s06 kernel:  ? percpu_counter_dec+0x10/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? spl_kmem_cache_free+0xda/0x140 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:45:26 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 19:45:26 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  kthread+0x127/0x150
Jul 02 19:45:26 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 19:45:26 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 19:45:26 k3s06 kernel:  </TASK>
Jul 02 19:49:02 k3s06 kernel: INFO: task txg_sync:1341 blocked for more than 120 seconds.
Jul 02 19:49:02 k3s06 kernel:       Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 19:49:02 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 02 19:49:02 k3s06 kernel: task:txg_sync        state:D stack:    0 pid: 1341 ppid:     2 flags:0x00004004
Jul 02 19:49:02 k3s06 kernel: Call Trace:
Jul 02 19:49:02 k3s06 kernel:  <TASK>
Jul 02 19:49:02 k3s06 kernel:  __schedule+0x24e/0x590
Jul 02 19:49:02 k3s06 kernel:  schedule+0x69/0x110
Jul 02 19:49:02 k3s06 kernel:  vcmn_err.cold+0x76/0x78 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 19:49:02 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? kmem_cache_free+0x24f/0x290
Jul 02 19:49:02 k3s06 kernel:  ? percpu_counter_dec+0x10/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? spl_kmem_cache_free+0xda/0x140 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:49:02 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 19:49:02 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  kthread+0x127/0x150
Jul 02 19:49:02 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 19:49:02 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 19:49:02 k3s06 kernel:  </TASK>

I did another unclean shutdown again and rebooted. Tried some of the suggestions within #13483

echo 1 > /sys/module/zfs/parameters/zil_replay_disable
echo 1 > /sys/module/zfs/parameters/zfs_recover

Then issues another scrub, which completed successfully:

 zpool status
  pool: k3s06
 state: ONLINE
  scan: scrub repaired 0B in 09:07:12 with 0 errors on Mon Jul  3 05:54:45 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        k3s06                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA109YVK-part3  ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA1040NK-part3  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VGJJ0Y5G-part3     ONLINE       0     0     0
            ata-WDC_WD120EFBX-68B0EN0_5QJ6U62B-part3    ONLINE       0     0     0
        cache
          nvme0n1                                       ONLINE       0     0     0

errors: No known data errors

As expected from the tunables, panic messages now became warning messages:

Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27691830000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=276917fa000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279691bc000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d90833000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=276917fb000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d907b0000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=197224c4000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=291de47e000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28a52005000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2c13ab29000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2c118da9000 size=2000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27ecdfbe000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27341d0b000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2915ddb2000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d4ea04000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=196d5824000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2529b350000 size=f000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279350b0000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27ac475e000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2824d6d6000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2824d072000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28afcbcf000 size=1000)

This seem to run clean as well:

sudo zdb -AAA -b k3s06

Traversing all blocks to verify nothing leaked ...

loading concrete vdev 1, metaslab 464 of 465 ...
8.91T completed (10118MB/s) estimated time remaining: 0hr 00min 00sec        
        No leaks (block sum matches space maps exactly)

        bp count:              68746233
        ganged count:                 0
        bp logical:       9955783857664      avg: 144819
        bp physical:      9789308441088      avg: 142397     compression:   1.02
        bp allocated:     9798470672384      avg: 142531     compression:   1.02
        bp deduped:                   0    ref>1:      0   deduplication:   1.00
        Normal class:     9798686769152     used: 61.46%
        Embedded log class         192512     used:  0.00%

        additional, non-pointer bps of type 0:      10841
        Dittoed blocks on same vdev: 77922

This time I was able to get a clean shutdown. Rebooted without the tunables.

System is now running 3+ hours under normal load, no panic messages have been seen since yet. Will keep monitoring.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions