-
Notifications
You must be signed in to change notification settings - Fork 402
Drop the ChannelMonitorUpdateStatus::PermanentFailure variant #2562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop the ChannelMonitorUpdateStatus::PermanentFailure variant #2562
Conversation
lightning/src/ln/channelmanager.rs
Outdated
res.map(|_| ()) | ||
} else { | ||
log_error!(self.logger, "Persisting initial ChannelMonitor failed, implying the funding outpoint was duplicated"); | ||
return Err(MsgHandleErrInternal::send_err_msg_no_close( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we generate an event to notify the user in this case, or is it rare enough that we won't bother?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for an inbound channel we should not - someone can spam us with these all days and the user doesn't need to care.
if let Ok(persist_state) = monitor_res { | ||
match self.id_to_peer.lock().unwrap().entry(chan.context.channel_id()) { | ||
hash_map::Entry::Occupied(_) => { | ||
return Err(MsgHandleErrInternal::send_err_msg_no_close( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be unreachable now, right? Could debug_assert!(false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but post-v2-chanels maybe not? In any case I'm actually gonna reorder the checks here anyway.
Gonna wait to address feedback until #2495. |
6494cd4
to
dc5d362
Compare
Rebased and addressed comments. Apologies to reviewers, I wrote this, realized I needed to wait on #2495, then when that got closed opened it, forgetting that I had left a few things left to clean up before opening. In any case, this now removes a ton of code in the monitor handling pipeline in |
I am not sure if we should be doing this, IIUC from overall description of PR. What if I as client don't want to complicate my architecture and don't want to remember any non-persisted updates. PermanentFailure is a perfectly valid use-case to return after repeated retries have failed. Now LDK might choose to either shutdown or force-close. (we can always shutdown if we think that is right for this kind of failure) Our async is not a future that can fail but more of a "Delayable" iiuc. (because there is no channel_monitor_update_failed option) In a perfect world, wouldn't we want to keep our node running (for chain-watch) and fail all other operations until persistence works?
Wouldn't it be better for us to have future control/visibility over this failure rather than expecting client to call shutdown on this failure.
This doesn't seem equivalent from api perspective, it is async on fail. |
The big issue with a failure variant here is that we really can't handle it in any reasonable way - we currently force-close the channel, but that is actually incredibly unsafe! If we force-close with the latest monitor state we will broadcast a transaction which may have outputs which are only claimable with the latest monitor state (which the user told us they failed to persist!) We do this under the assumption that maybe the disk caught fire and we should try to get our money back ASAP, but that's certainly not a given - the disk may simply be disconnected (especially true as more folks move towards remote storage). Really if I/O fails for any reason we need to get a human, there's just not much else we can do in Lightning. I chatted a bit with @domZippilli and we concluded probably all the in-LDK persistence stuff needs to expose some kind of "an error happened, please get a human and probably shut down" notification, but there's not a hell of a lot more we can do. In the future, we may want to add a |
@TheBlueMatt did we simply come up with a more complex way of doing this ^ ? In other words, rather than introducing some kind of novel notification, could this PR just change the way |
If the current handling for PermanentFailure is not satisfactory, we could replace it with shutdown instead of getting rid of failure return. It is better if the node is crash-looping until DB/disk comes back up than adding a constraint about storage not failing. Is shutdown and restart unsafe? Afaiu we wouldn't have committed without updated persist, and with persistence failing we can't really do much.
If we keep the failure return, it is easier to improve this in future without interface change and impacting customer's architecture. |
Right, I think we agree here, there's just currently no mechanism in LDK to trigger the user to shut down the node. We can certainly build it (in the form of refusing to talk to peers until a resolution or just giving the user a notification from the I/O driver that we need to shut down), but it doesn't exist today.
I think the API change here kinda encourages that? We could mention in the documentation that the user may want to consider panic'ing to trigger immediate shutdown in the most aggressive way possible :).
No, its safe, as long as we dont get a Even if we skip the part of this PR that drops the We should still encourage users to use |
Yes, my only concern is with dropping permanent failure completely.
In this rare case, I am in favor of panic! 😅, won't that work? panic in channel-manager
Not sure but I think this encourages unsafe behaviour, we are overloading meaning of InProgress.
And in worst case, this encourages silent failures, because if there is a failure, monitorUpdate is InProgress-persist, i can't handle failure any other way(i don't want to remember updates). Now channel is in disabled state, user doesn't even know it is. |
Right, we need to have a conversation about how to design any such failure case handling. If there's not something that immediately makes sense this late in a release cycle, I'm not sure that having an error code that we can't handle is a great design.
Possibly - I kinda hate the Rust ecosystem here since by default panics actually unwind rather than actually abort, and tokio by default catches the unwind and just continues chugging along (which is more normal behavior in something like a webserver - if a request comes in which causes some bad behavior, fail the request, but don't kill the whole server, but doesn't make sense here). We do recommend users use We do panic elsewhere in LDK, and panics in a mutex are kinda "viral" (later attempts to take the lock will also panic when we unwrap the lock), so the node would still (mostly) shut down, just not in the most orderly fashion.
It wouldn't grow unbounded/huge, a few updates per channel you have open pending, after that we just kinda start locking up and not handling new messages/sending new messages until persistence starts completing again.
Users can always list the pending monitor updates and go through all the channelmonitors with pending updates, persist the full current state, and mark them complete. We could even expose some kind of replay-attempt function which just calls |
I'm not sure why it wouldn't make sense here? If correctly implemented, would it be exactly what we want a server to do, in particular if multiple nodes would be handled by one server. I.e., we'd want it to propagate the panic of a failing task sufficiently upwards, catch it there, and restart that branch of tasks belong to the failing node? |
Right, I was thinking more the ldk-sample/ldk-node world where you have a single node with potentially many tokio tasks. If you're going the multi-tennancy route you probably do want to catch the panic, you just want to make sure that all the tasks for the given node all shut down, rather than having that node limp along half-running. |
I think what's key to making this understandable is how InProgress is defined in an "infallible" world (I think "infallible" implies "panic if there's an error", at least that is what we do). Intuitively I expect:
I guess if PF is out of the enum, I'm going to panic myself if there's a write failure. There's nothing else I can really do. If you make this trait method fallible, I would return an For me, the most confusing possibility is saying InProgress is what you should do for transient errors. Switching to an async mode in order handle transient errors is a little surprising, and makes me feel more exposed. Knowing that LDK is blocking on your write is very reassuring, because I think (implicitly) it means that channel is locked during that time, and losing that "lock" is much more fraught. In fact, I think transient errors should never be presented to this trait, if the separations of concerns is correct. They should be handled in a storage backend that knows the right amount of retrying/backing off etc., and when to give up. The backend's methods should be definitively fallible to this layer -- they either succeed within parameters, or the storage is now outside of parameters (failing), and we should panic. This is of special interest to me right now, as I think if the backend returns transient errors, it's much harder to write a generic persister implementation of any kind. You could substitute panic for "signal an orderly shutdown" and I think the only difference is whether one attempts to unwind in some way. Incidentally, panic-on-fail is how we (c=) handle the other infallible persistence methods, like failing to write the manager/graph/scorer. I think the appeal of signaling an error to LDK is that it can take some sort of action. It sounds like there's actually no safe action to take, so guiding customers to panic when infallible methods fail is probably the right thing? And to use |
Ok, I was hoping panic would work because we are relying on this in other parts of our code assuming abort functionality, if it doesn't function as per our requirements then ok.
Ldk pending events won't grow unbounded, but I was more of concerned about client's threads waiting on ldk and ldk will block them, which is unbounded, all other requests to service will probably be dropped/stuck. (So i probably wouldn't recommend infinite wait inside persist.)
Yes, i guess people who are using it as async api can do that.
I am bit confused, we were not discussing to expose transient errors from trait without retries. Only worry was about what to do after retries are exhausted. (or failure can't be recovered from in short duration of retry)
From what you describe, I think this supports Matt's thesis to remove permanent failure for now. Because it doesn't seem feasible to implement panic at failure. In summary, I am ok with current approach, given we will document for client to panic/shutdown on non-recoverable failure. (somehow) |
Very possible, even probable, that I'm the confused one. The opening comment on the PR includes:
Between that, and comments on #2359, I thought the idea was that we expected implementations of
Fair. FWIW I only suggested panic there because there's no mechanism I know of to call back with a failure. That would also be an option, but AFAIK once you return |
Right, I guess we should define "transient errors", here - I assume a storage backend will mostly retry or have a reliable backend such that any error that happens is a "Real Error" - the disk is disconnected, the network is offline, etc. However, those errors are still mostly "transient" in that they will eventually recover - our disk isn't literally on fire, we're not hosted at OVH, etc. Thus, we shouldn't force-close, but also shouldn't really do anything...there's nothing we can do, aside from ensuring we go get a human. So I think we could do something fancier in 0.0.118 (disconnect all peers, stop making progress until the user recovers the node, but stay alive, maybe have some kind of Of the two I marginally prefer the second, as it makes the behavior more visible to the user, though only marginally so as often the user is delegating to the LDK filesystem/sqlite/etc persister, which does the panic for the user. But at least those not using those persisters can choose if they want to panic or mark it in-flight and do their own clean-ish shutdowns. |
They could always do that by just using InProgress. And clients must be already be using async if they have existing logic for it. (i dont see a reason to mark in-progress and then do a shutdown. if anything we would want to shutdown before returning) If we want to build some better handling in near future, I prefer keeping the permanentFailure variant and just improve it's handling later. for now if immediate panic works, we can do that and maybe make it clear in docs. (or suggest clean shutdown in case of permanent fail) This will also be consistent with how MUP or kvstore implementations will handle it currently. Will leave it to you to decide either method. |
I've heard that naming things is hard. Perhaps this terminology works:
I'm saying this for docs -- "intervention" is the key distinction you're making between these two categories of recoverable error. Feels weird to invent a term here, I feel sure I read about this in the SRE book. You may or may not consider introducing a variant like |
FWIW, I'm also leaning on this side, as going forward we should find a solution where it's not the |
I think my point here is that you can't generally do a shutdown in the middle of a sync call stack (aside from panic). If you want to do a "semi-orderly" shutdown (even if your persistence is borked) you probably need to trigger some other code, then return so that the ChannelManager can quiet down.
Mmm, okay, I'm fine with that naming and leaving it as a failure - can re-add a "InterventionRequiredError" variant to the enum which would cause a panic. We could then leave |
I am ok with either, but could this be "UnrecoverableError"? |
This part is still confusing, It is either of :
Second recommendation makes more sense for clients using async imo, but ofcourse it can be followed by anyone. |
Squashed all the commits, the doc changes since yesterday in total are: $ git diff-tree -U3 d168ba6c83be92009b7a59d4ead6f742f1891b51 b0ff3050
diff --git a/lightning/src/chain/chainmonitor.rs b/lightning/src/chain/chainmonitor.rs
index df56bda11..268b29257 100644
--- a/lightning/src/chain/chainmonitor.rs
+++ b/lightning/src/chain/chainmonitor.rs
@@ -78,34 +78,48 @@ impl MonitorUpdateId {
/// `Persist` defines behavior for persisting channel monitors: this could mean
/// writing once to disk, and/or uploading to one or more backup services.
///
-/// Each method can return three possible values:
-/// * If persistence (including any relevant `fsync()` calls) happens immediately, the
-/// implementation should return [`ChannelMonitorUpdateStatus::Completed`], indicating normal
-/// channel operation should continue.
+/// Persistence can happen in one of two ways - synchronously completing before the trait method
+/// calls return or asynchronously in the background.
///
-/// * If persistence happens asynchronously, implementations can return
-/// [`ChannelMonitorUpdateStatus::InProgress`] while the update continues in the background.
-/// Once the update completes, [`ChainMonitor::channel_monitor_updated`] should be called with
-/// the corresponding [`MonitorUpdateId`].
+/// # For those implementing synchronous persistence
///
-/// Note that unlike the direct [`chain::Watch`] interface,
-/// [`ChainMonitor::channel_monitor_updated`] must be called once for *each* update which occurs.
+/// * If persistence completes fully (including any relevant `fsync()` calls), the implementation
+/// should return [`ChannelMonitorUpdateStatus::Completed`], indicating normal channel operation
+/// should continue.
///
-/// If persistence fails for some reason, implementations should consider returning
-/// [`ChannelMonitorUpdateStatus::InProgress`] and attempt to shut down or retry the persistence
-/// operation in the background through [`ChainMonitor::list_pending_monitor_updates`] and
+/// * If persistence fails for some reason, implementations should consider returning
+/// [`ChannelMonitorUpdateStatus::InProgress`] and retry all pending persistence operations in
+/// the background with [`ChainMonitor::list_pending_monitor_updates`] and
/// [`ChainMonitor::get_monitor`].
///
/// Once a full [`ChannelMonitor`] has been persisted, all pending updates for that channel can
/// be marked as complete via [`ChainMonitor::channel_monitor_updated`].
///
+/// If at some point no further progress can be made towards persisting the pending updates, the
+/// node should simply shut down.
+///
/// * If the persistence has failed and cannot be retried,
/// [`ChannelMonitorUpdateStatus::UnrecoverableError`] can be used, though this will result in
/// an immediate panic and future operations in LDK generally failing.
///
-/// Third-party watchtowers may be built as a part of an implementation of this trait, with the
-/// advantage that you can control whether to resume channel operation depending on if an update
-/// has been persisted to a watchtower. For this, you may find the following methods useful:
+/// # For those implementing asynchronous persistence
+///
+/// All calls should generally spawn a background task and immediately return
+/// [`ChannelMonitorUpdateStatus::InProgress`]. Once the update completes,
+/// [`ChainMonitor::channel_monitor_updated`] should be called with the corresponding
+/// [`MonitorUpdateId`].
+///
+/// Note that unlike the direct [`chain::Watch`] interface,
+/// [`ChainMonitor::channel_monitor_updated`] must be called once for *each* update which occurs.
+///
+/// If at some point no further progress can be made towards persisting a pending update, the node
+/// should simply shut down.
+///
+/// # Using remote watchtowers
+///
+/// Watchtowers may be updated as a part of an implementation of this trait, utilizing the async
+/// update process described above while the watchtower is being updated. The following methods are
+/// provided for bulding transactions for a watchtower:
/// [`ChannelMonitor::initial_counterparty_commitment_tx`],
/// [`ChannelMonitor::counterparty_commitment_txs_from_update`],
/// [`ChannelMonitor::sign_to_local_justice_tx`], [`TrustedCommitmentTransaction::revokeable_output_index`],
$ |
/// [`ChannelMonitorUpdateStatus::PermanentFailure`], in which case the channel will likely be | ||
/// closed without broadcasting the latest state. See | ||
/// [`ChannelMonitorUpdateStatus::PermanentFailure`] for more details. | ||
/// * If persistence fails for some reason, implementations should consider returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If persistence fails for some reason:
Currently in this we have two options,
- either return UnrecoverableError after retries have exhausted.
OR
- return InProgress and retry out of band using
list_pending_monitor_updates
.
I would prefer the first option to be mentioned before second one, since that is what sync really means, to take decision then and there. I would also suggest removing the second option from sync, since user can use async persistence if they already have a process to retry-out-of-band at zero extra cost. Otherwise, we are mixing the two.
My thought process for this is : If someone is server-side and persistence has failed after exhausted retries, then there is really something wrong with my high-availability(99.99%) persistence(infra- outage). I am ok with shutting down instead of creating a separate process for retrying async, to keep things simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda prefer this order because we prefer users to do more retries in the background, but I understand many folks won't really bother with that cause its hard :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, its not that hard, just additional complexity some may want to avoid. :)
/// # For those implementing asynchronous persistence | ||
/// | ||
/// All calls should generally spawn a background task and immediately return | ||
/// [`ChannelMonitorUpdateStatus::InProgress`]. Once the update completes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persistence can still fail in async.
list_pending_monitor_updates retrying process is necessary in this as well.
Infact if user has list_pending_monitor_updates retrying process
, they should just use async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I understand this - the documentation mentions how to handle failure in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it only mentions "Once the update completes
".
There is still a case where update failed, and client needs to have mechanism to
retry all pending persistence operations in the background with [`ChainMonitor::list_pending_monitor_updates`] and [`ChainMonitor::get_monitor`].
while using async persistence.
depending on how their async works.
80e41b1
to
914c299
Compare
Pushed some (squashed) grammar/spelling fixes: $ git diff-tree -U1 b0ff3050 914c2999
diff --git a/lightning-persister/src/fs_store.rs b/lightning-persister/src/fs_store.rs
index 4a70d238c..42b28018f 100644
--- a/lightning-persister/src/fs_store.rs
+++ b/lightning-persister/src/fs_store.rs
@@ -438,3 +438,3 @@ mod tests {
// Test that if the store's path to channel data is read-only, writing a
- // monitor to it results in the store returning a InProgress.
+ // monitor to it results in the store returning an InProgress.
// Windows ignores the read-only flag for folders, so this test is Unix-only.
diff --git a/lightning/src/chain/chainmonitor.rs b/lightning/src/chain/chainmonitor.rs
index 268b29257..f41512b14 100644
--- a/lightning/src/chain/chainmonitor.rs
+++ b/lightning/src/chain/chainmonitor.rs
@@ -100,3 +100,3 @@ impl MonitorUpdateId {
///
-/// * If the persistence has failed and cannot be retried,
+/// * If the persistence has failed and cannot be retried further (e.g. because of some timeout),
/// [`ChannelMonitorUpdateStatus::UnrecoverableError`] can be used, though this will result in
diff --git a/lightning/src/ln/chanmon_update_fail_tests.rs b/lightning/src/ln/chanmon_update_fail_tests.rs
index dc4b3d57a..dc9bbd3e4 100644
--- a/lightning/src/ln/chanmon_update_fail_tests.rs
+++ b/lightning/src/ln/chanmon_update_fail_tests.rs
@@ -106,3 +106,3 @@ fn test_monitor_and_persister_update_fail() {
- // Apply the montior update to the original ChainMonitor, ensuring the
+ // Apply the monitor update to the original ChainMonitor, ensuring the
// ChannelManager and ChannelMonitor aren't out of sync.
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 93e8f617d..8b524d8f3 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -928,3 +928,3 @@ where
/// [`chain::Watch::watch_channel`]/[`update_channel`] or before completing async writes. With
-/// ChannelManagers, writing updates happens out-of-band (and will prevent any other
+/// `ChannelManager`s, writing updates happens out-of-band (and will prevent any other
/// `ChannelManager` operations from occurring during the serialization process). If the
$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, happy to have any further (documentation) changes happen in follow-ups.
CI needs a fix. |
When a `ChannelMonitorUpdate` fails to apply, it generally means we cannot reach our storage backend. This, in general, is a critical issue, but is often only a transient issue. Sadly, users see the failure variant and return it on any I/O error, resulting in channel force-closures due to transient issues. Users don't generally expect force-closes in most cases, and luckily with async `ChannelMonitorUpdate`s supported we don't take any risk by "delaying" the `ChannelMonitorUpdate` indefinitely. Thus, here we drop the `PermanentFailure` variant entirely, making all failures instead be "the update is in progress, but won't ever complete", which is equivalent if we do not close the channel automatically.
Now that `PermanentFailure` is not a possible return value, we can simply remove handling of it in `ChannelMonitor`.
The `MonitorEvent::CommitmentTxConfirmed` has always been a result of us force-closing the channel, not the counterparty doing so. Thus, it was always a bit of a misnomer. Worse, it carried over into the channel's `ClosureReason` in the event API. Here we simply rename it and use the proper `ClosureReason`.
Since we now (almost) support async monitor update persistence, the documentation on `ChannelMonitorUpdateStatus` can be updated to no longer suggest users must keep a local copy that persists before returning. However, because there are still a few remaining issues, we note that async support is currently beta and explicily warn of potential for funds-loss. Fixes lightningdevkit#1684
In the previous commit various dead code was removed. Here we finish that cleanup by removing uneccessary indentation and syntax.
Now that `handle_new_monitor_update` can no longer return an error, it similarly no longer needs any code to handle errors. Here we remove that code, substantially reducing macro variants.
In general, doc comments on trait impl blocks are not very visible in rustdoc output, and unless they provide useful information they should be elided. Here we drop useless doc comments on `ChainMonitor`'s `Watch` impl methods.
While there is no great way to handle a true failure to persist a `ChannelMonitorUpdate`, it is confusing for users for there to be no error variant at all on an I/O operation. Thus, here we re-add the error variant removed over the past handful of commits, but rather than handle it in a truly unsafe way, we simply panic, optimizing for maximum mutex poisoning to ensure any future operations fail and return immediately. In the future, we may consider changing the handling of this to instead set some "disconnect all peers and fail all operations" bool to give the user a better chance to shutdown in a semi-orderly fashion, but there's only so much that can be done in lightning if we truly cannot persist new updates.
914c299
to
f254c56
Compare
Grrrr, sorry about that. Somehow the $ git diff-tree -U1 914c2999 f254c565
diff --git a/fuzz/src/chanmon_consistency.rs b/fuzz/src/chanmon_consistency.rs
index b6d41fb99..6a2007165 100644
--- a/fuzz/src/chanmon_consistency.rs
+++ b/fuzz/src/chanmon_consistency.rs
@@ -140,3 +140,3 @@ impl TestChainMonitor {
impl chain::Watch<TestChannelSigner> for TestChainMonitor {
- fn watch_channel(&self, funding_txo: OutPoint, monitor: channelmonitor::ChannelMonitor<TestChannelSigner>) -> chain::ChannelMonitorUpdateStatus {
+ fn watch_channel(&self, funding_txo: OutPoint, monitor: channelmonitor::ChannelMonitor<TestChannelSigner>) -> Result<chain::ChannelMonitorUpdateStatus, ()> {
let mut ser = VecWriter(Vec::new());
@@ -502,3 +502,3 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out) {
assert_eq!(chain_monitor.chain_monitor.watch_channel(funding_txo, mon),
- ChannelMonitorUpdateStatus::Completed);
+ Ok(ChannelMonitorUpdateStatus::Completed));
}
diff --git a/lightning/src/chain/chainmonitor.rs b/lightning/src/chain/chainmonitor.rs
index f41512b14..b6909cb3e 100644
--- a/lightning/src/chain/chainmonitor.rs
+++ b/lightning/src/chain/chainmonitor.rs
@@ -1034,2 +1034,3 @@ mod tests {
#[test]
+ #[cfg(feature = "std")]
fn update_during_chainsync_poisons_channel() {
diff --git a/lightning/src/ln/functional_test_utils.rs b/lightning/src/ln/functional_test_utils.rs
index 9a5a3466a..5a1a21e29 100644
--- a/lightning/src/ln/functional_test_utils.rs
+++ b/lightning/src/ln/functional_test_utils.rs
@@ -424,4 +424,6 @@ pub struct Node<'chan_man, 'node_cfg: 'chan_man, 'chan_mon_cfg: 'node_cfg> {
}
-impl<'a, 'b, 'c> core::panic::UnwindSafe for Node<'a, 'b, 'c> {}
-impl<'a, 'b, 'c> core::panic::RefUnwindSafe for Node<'a, 'b, 'c> {}
+#[cfg(feature = "std")]
+impl<'a, 'b, 'c> std::panic::UnwindSafe for Node<'a, 'b, 'c> {}
+#[cfg(feature = "std")]
+impl<'a, 'b, 'c> std::panic::RefUnwindSafe for Node<'a, 'b, 'c> {}
impl<'a, 'b, 'c> Node<'a, 'b, 'c> {
$ |
lightning/src/ln/channelmanager.rs
Outdated
$self.get_channel_update_for_broadcast(&$chan).ok(), $chan.context.get_value_satoshis())); | ||
$remove; | ||
res | ||
}, | ||
ChannelMonitorUpdateStatus::Completed => { | ||
$completed; | ||
Ok(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
channelmanager.rs
L3379 has a reference to permanent monitor failure
@@ -470,7 +470,7 @@ mod tests { | |||
index: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 lines up is a reference to permanent failure, though I suppose it's not really incorrect per se. Also in the test name.
@@ -436,7 +436,7 @@ mod tests { | |||
} | |||
|
|||
// Test that if the store's path to channel data is read-only, writing a | |||
// monitor to it results in the store returning a PermanentFailure. | |||
// monitor to it results in the store returning an InProgress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be UnrecoverableError
now
/// If at some point no further progress can be made towards persisting the pending updates, the | ||
/// node should simply shut down. | ||
/// | ||
/// * If the persistence has failed and cannot be retried further (e.g. because of some timeout), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit(non-blocking): timeout is generally transient and retry should fix that
instead it should be "e.g. because of some outage"
Opened a followup PR at #2591 |
Doc and comment followups to #2562
@@ -134,7 +134,7 @@ pub enum MonitorEvent { | |||
HTLCEvent(HTLCUpdate), | |||
|
|||
/// A monitor event that the Channel's commitment transaction was confirmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this piece of docs still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, it should be changed, yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at #2668
Based on #2169.When a
ChannelMonitorUpdate
fails to apply, it generally meanswe cannot reach our storage backend. This, in general, is a
critical issue, but is often only a transient issue.
Sadly, users see the failure variant and return it on any I/O
error, resulting in channel force-closures due to transient issues.
Users don't generally expect force-closes in most cases, and
luckily with async
ChannelMonitorUpdate
s supported we don't takeany risk by "delaying" the
ChannelMonitorUpdate
indefinitely.Thus, here we drop the
PermanentFailure
variant entirely, makingall failures instead be "the update is in progress, but won't ever
complete", which is equivalent if we do not close the channel
automatically.