Only interrupt active disk I/Os in failmode=continue #17372

pcd1193182 · 2025-05-23T17:42:19Z

Sponsored by: [Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable.

Description

With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Advancing a ZIO from the VDEV_IO_START stage to the VDEV_IO_DONE stage causes a problem if the IO hadn't actually been issued yet, since the IO_DONE stage unconditionally removes the zio from the active list. That fails if the IO hasn't been added to the active list.

To prevent this, we just only wake IOs that are actually active leaf vdev IOs. Any other I/O will get ignored; this does reduce the scope of what the continue failmode can address, but I think it does so in a helpful way. Feedback on other ideas is welcome, though.

How Has This Been Tested?

I used mdadm to create virtual devices, and then built a pool on top of them. I then used mdadm suspend to cause the IOs to one of the devices to hang. I then set the failmode to continue after the deadman messages started to appear in the log. Without the patch, the system reliably kernel panics. With the patch, the IOs are reexecuted

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable. With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.

robn

To be clear: this is about zfs_deadman_failmode=continue (tunable/modparam), not failmode=continue (pool property), that is, spa->spa_deadman_failmode vs spa->spa_failmode.

(Important to me, because I'm in the middle of rehabilitating failmode=continue proper, via #17355 and beyond. So if we are bumping into each other, we should have a chat about that. But I don't think we are!).

Property lawyering aside, this is good for what it is! My gut feel is that at least some of this should be "inside" the queue, but that's vague and this function obviously needs more work than this, so we good. Good stuff!

amotin · 2025-05-27T18:25:47Z

It may crash less often, as you say, but damn it is dirty! I'd vote for removal. Every I/O sent to block layer must complete. Period. Otherwise we are looking for indefinite number of problems, including random memory corruptions, etc.

behlendorf · 2025-05-28T16:46:07Z

Do we know if this is actually still useful for anybody? This code was originally added long ago as an ugly last resort to attempt to deal with some broken scsi error handling observed in a now ancient and unsupported Linux kernel. Things should be better now, personally I haven't needed to set this in many years. Unless we have a real need I'd be inclined to remove it as well.

pcd1193182 · 2025-05-28T18:53:31Z

The reason I filed the PR is that we had a customer who ran into a case where their disks dropped an I/O on the floor. The pool hung waiting for the I/O to complete for multiple hours, so it was clearly not going to complete. While forcing a reset would not have been the end of the world, it would have been better if they could have forced the I/O to retry and continued the system, even if it resulted in a memory leak of the bio or some other issue.

I would prefer not to remove the functionality, personally. It's true that it's not super stable, and it's impossible to guarantee that it will work correctly. But in the cases where it comes up, it is reasonable for the user to want to attempt something, rather than just letting the system hang or panic. That said, I do think that we should either try to improve it or get rid of it. I think this patch makes it usable enough to be worth keeping, but I am open to the argument that it's not even worth attempting.

amotin · 2025-05-28T19:01:35Z

If some disk is misbehaving, it should be detached or disabled in some way, so that its HBA could complete all stuck I/Os. Just making ZFS issue new I/Os does not mean they won't stuck either. Same time if previous I/Os proceed, they might read/write memory that is not longer allocated, or even worse -- repurposed. And if they complete at some point, ZFS may no longer wait for them, so we'll get some double completion. Considering this is pretty isolated chunk of code, I don't think I care about this patch to go in, but I personally won't touch it even with a 10 foot pole.

behlendorf · 2025-05-28T20:15:42Z

Ok, well since this functionality is still clearly handy for its original purpose I agree let's improve it as much as we reasonably can. It'll never be entirely safe, but sometimes you're just left with no good options.

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable. With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Co-authored-by: Paul Dagnelie <[email protected]> Closes openzfs#17372

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable. With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Co-authored-by: Paul Dagnelie <[email protected]> Closes #17372

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable. With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Co-authored-by: Paul Dagnelie <[email protected]> Closes openzfs#17372

robn approved these changes May 24, 2025

View reviewed changes

amotin added the Status: Code Review Needed Ready for review and testing label May 27, 2025

behlendorf approved these changes May 28, 2025

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 28, 2025

behlendorf merged commit c464f1d into openzfs:master May 28, 2025
21 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only interrupt active disk I/Os in failmode=continue #17372

Only interrupt active disk I/Os in failmode=continue #17372

Uh oh!

pcd1193182 commented May 23, 2025

Uh oh!

robn left a comment

Uh oh!

amotin commented May 27, 2025 •

edited

Loading

Uh oh!

behlendorf commented May 28, 2025

Uh oh!

pcd1193182 commented May 28, 2025

Uh oh!

amotin commented May 28, 2025

Uh oh!

behlendorf commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Only interrupt active disk I/Os in failmode=continue #17372

Only interrupt active disk I/Os in failmode=continue #17372

Uh oh!

Conversation

pcd1193182 commented May 23, 2025

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

robn left a comment

Choose a reason for hiding this comment

Uh oh!

amotin commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

behlendorf commented May 28, 2025

Uh oh!

pcd1193182 commented May 28, 2025

Uh oh!

amotin commented May 28, 2025

Uh oh!

behlendorf commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

amotin commented May 27, 2025 •

edited

Loading