Skip to content

Commit 00ae67d

Browse files
committed
Stop re-hydrating pending payments once they are fully resolved
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we, finally, begin actually using the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, skipping re-hydration of pending payments once they have been fully resolved through to a user `Event`.
1 parent 845b398 commit 00ae67d

File tree

4 files changed

+55
-129
lines changed

4 files changed

+55
-129
lines changed

lightning/src/chain/channelmonitor.rs

Lines changed: 10 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -3031,10 +3031,6 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
30313031
/// Gets the set of outbound HTLCs which can be (or have been) resolved by this
30323032
/// `ChannelMonitor`. This is used to determine if an HTLC was removed from the channel prior
30333033
/// to the `ChannelManager` having been persisted.
3034-
///
3035-
/// This is similar to [`Self::get_pending_or_resolved_outbound_htlcs`] except it includes
3036-
/// HTLCs which were resolved on-chain (i.e. where the final HTLC resolution was done by an
3037-
/// event from this `ChannelMonitor`).
30383034
pub(crate) fn get_all_current_outbound_htlcs(
30393035
&self,
30403036
) -> HashMap<HTLCSource, (HTLCOutputInCommitment, Option<PaymentPreimage>)> {
@@ -3047,8 +3043,11 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
30473043
for &(ref htlc, ref source_option) in latest_outpoints.iter() {
30483044
if let &Some(ref source) = source_option {
30493045
let htlc_id = SentHTLCId::from_source(source);
3050-
let preimage_opt = us.counterparty_fulfilled_htlcs.get(&htlc_id).cloned();
3051-
res.insert((**source).clone(), (htlc.clone(), preimage_opt));
3046+
if !us.htlcs_resolved_to_user.contains(&htlc_id) {
3047+
let preimage_opt =
3048+
us.counterparty_fulfilled_htlcs.get(&htlc_id).cloned();
3049+
res.insert((**source).clone(), (htlc.clone(), preimage_opt));
3050+
}
30523051
}
30533052
}
30543053
}
@@ -3104,6 +3103,11 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
31043103
} else {
31053104
continue;
31063105
};
3106+
let htlc_id = SentHTLCId::from_source(source);
3107+
if us.htlcs_resolved_to_user.contains(&htlc_id) {
3108+
continue;
3109+
}
3110+
31073111
let confirmed = $htlc_iter.find(|(_, conf_src)| Some(source) == *conf_src);
31083112
if let Some((confirmed_htlc, _)) = confirmed {
31093113
let filter = |v: &&IrrevocablyResolvedHTLC| {
@@ -3176,96 +3180,6 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
31763180
res
31773181
}
31783182

3179-
/// Gets the set of outbound HTLCs which are pending resolution in this channel or which were
3180-
/// resolved with a preimage from our counterparty.
3181-
///
3182-
/// This is used to reconstruct pending outbound payments on restart in the ChannelManager.
3183-
///
3184-
/// Currently, the preimage is unused, however if it is present in the relevant internal state
3185-
/// an HTLC is always included even if it has been resolved.
3186-
#[rustfmt::skip]
3187-
pub(crate) fn get_pending_or_resolved_outbound_htlcs(&self) -> HashMap<HTLCSource, (HTLCOutputInCommitment, Option<PaymentPreimage>)> {
3188-
let us = self.inner.lock().unwrap();
3189-
// We're only concerned with the confirmation count of HTLC transactions, and don't
3190-
// actually care how many confirmations a commitment transaction may or may not have. Thus,
3191-
// we look for either a FundingSpendConfirmation event or a funding_spend_confirmed.
3192-
let confirmed_txid = us.funding_spend_confirmed.or_else(|| {
3193-
us.onchain_events_awaiting_threshold_conf.iter().find_map(|event| {
3194-
if let OnchainEvent::FundingSpendConfirmation { .. } = event.event {
3195-
Some(event.txid)
3196-
} else { None }
3197-
})
3198-
});
3199-
3200-
if confirmed_txid.is_none() {
3201-
// If we have not seen a commitment transaction on-chain (ie the channel is not yet
3202-
// closed), just get the full set.
3203-
mem::drop(us);
3204-
return self.get_all_current_outbound_htlcs();
3205-
}
3206-
3207-
let mut res = new_hash_map();
3208-
macro_rules! walk_htlcs {
3209-
($holder_commitment: expr, $htlc_iter: expr) => {
3210-
for (htlc, source) in $htlc_iter {
3211-
if us.htlcs_resolved_on_chain.iter().any(|v| v.commitment_tx_output_idx == htlc.transaction_output_index) {
3212-
// We should assert that funding_spend_confirmed is_some() here, but we
3213-
// have some unit tests which violate HTLC transaction CSVs entirely and
3214-
// would fail.
3215-
// TODO: Once tests all connect transactions at consensus-valid times, we
3216-
// should assert here like we do in `get_claimable_balances`.
3217-
} else if htlc.offered == $holder_commitment {
3218-
// If the payment was outbound, check if there's an HTLCUpdate
3219-
// indicating we have spent this HTLC with a timeout, claiming it back
3220-
// and awaiting confirmations on it.
3221-
let htlc_update_confd = us.onchain_events_awaiting_threshold_conf.iter().any(|event| {
3222-
if let OnchainEvent::HTLCUpdate { commitment_tx_output_idx: Some(commitment_tx_output_idx), .. } = event.event {
3223-
// If the HTLC was timed out, we wait for ANTI_REORG_DELAY blocks
3224-
// before considering it "no longer pending" - this matches when we
3225-
// provide the ChannelManager an HTLC failure event.
3226-
Some(commitment_tx_output_idx) == htlc.transaction_output_index &&
3227-
us.best_block.height >= event.height + ANTI_REORG_DELAY - 1
3228-
} else if let OnchainEvent::HTLCSpendConfirmation { commitment_tx_output_idx, .. } = event.event {
3229-
// If the HTLC was fulfilled with a preimage, we consider the HTLC
3230-
// immediately non-pending, matching when we provide ChannelManager
3231-
// the preimage.
3232-
Some(commitment_tx_output_idx) == htlc.transaction_output_index
3233-
} else { false }
3234-
});
3235-
if let Some(source) = source {
3236-
let counterparty_resolved_preimage_opt =
3237-
us.counterparty_fulfilled_htlcs.get(&SentHTLCId::from_source(source)).cloned();
3238-
if !htlc_update_confd || counterparty_resolved_preimage_opt.is_some() {
3239-
res.insert(source.clone(), (htlc.clone(), counterparty_resolved_preimage_opt));
3240-
}
3241-
} else {
3242-
panic!("Outbound HTLCs should have a source");
3243-
}
3244-
}
3245-
}
3246-
}
3247-
}
3248-
3249-
let commitment_txid = confirmed_txid.unwrap();
3250-
let funding_spent = get_confirmed_funding_scope!(us);
3251-
3252-
if Some(commitment_txid) == funding_spent.current_counterparty_commitment_txid || Some(commitment_txid) == funding_spent.prev_counterparty_commitment_txid {
3253-
walk_htlcs!(false, funding_spent.counterparty_claimable_outpoints.get(&commitment_txid).unwrap().iter().filter_map(|(a, b)| {
3254-
if let &Some(ref source) = b {
3255-
Some((a, Some(&**source)))
3256-
} else { None }
3257-
}));
3258-
} else if commitment_txid == funding_spent.current_holder_commitment_tx.trust().txid() {
3259-
walk_htlcs!(true, holder_commitment_htlcs!(us, CURRENT_WITH_SOURCES));
3260-
} else if let Some(prev_commitment_tx) = &funding_spent.prev_holder_commitment_tx {
3261-
if commitment_txid == prev_commitment_tx.trust().txid() {
3262-
walk_htlcs!(true, holder_commitment_htlcs!(us, PREV_WITH_SOURCES).unwrap());
3263-
}
3264-
}
3265-
3266-
res
3267-
}
3268-
32693183
pub(crate) fn get_stored_preimages(
32703184
&self,
32713185
) -> HashMap<PaymentHash, (PaymentPreimage, Vec<PaymentClaimDetails>)> {

lightning/src/ln/channelmanager.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16601,8 +16601,7 @@ where
1660116601
}
1660216602

1660316603
if is_channel_closed {
16604-
for (htlc_source, (htlc, _)) in monitor.get_pending_or_resolved_outbound_htlcs()
16605-
{
16604+
for (htlc_source, (htlc, _)) in monitor.get_all_current_outbound_htlcs() {
1660616605
let logger = WithChannelMonitor::from(
1660716606
&args.logger,
1660816607
monitor,

lightning/src/ln/functional_tests.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2043,6 +2043,8 @@ fn do_test_commitment_revoked_fail_backward_exhaustive(
20432043
check_added_monitors(&nodes[1], 0);
20442044
let events = nodes[1].node.get_and_clear_pending_events();
20452045
if deliver_bs_raa {
2046+
check_added_monitors(&nodes[1], 2);
2047+
} else {
20462048
check_added_monitors(&nodes[1], 1);
20472049
}
20482050
assert_eq!(events.len(), if deliver_bs_raa { 3 + nodes.len() - 1 } else { 3 + nodes.len() });
@@ -2060,7 +2062,7 @@ fn do_test_commitment_revoked_fail_backward_exhaustive(
20602062
)));
20612063

20622064
nodes[1].node.process_pending_htlc_forwards();
2063-
check_added_monitors(&nodes[1], 2);
2065+
check_added_monitors(&nodes[1], 1);
20642066

20652067
let mut events = nodes[1].node.get_and_clear_pending_msg_events();
20662068
assert_eq!(events.len(), if deliver_bs_raa { 4 } else { 3 });

lightning/src/ln/payment_tests.rs

Lines changed: 41 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1182,9 +1182,6 @@ fn do_test_completed_payment_not_retryable_on_reload(use_dust: bool) {
11821182
nodes[1].node.peer_disconnected(node_a_id);
11831183

11841184
nodes[0].node.test_process_background_events();
1185-
check_added_monitors(&nodes[0], 1); // TODO: Removed in the next commit as this only required
1186-
// when we are still seeing all payments, even resolved
1187-
// ones.
11881185

11891186
let mut reconnect_args = ReconnectArgs::new(&nodes[0], &nodes[1]);
11901187
reconnect_args.send_channel_ready = (true, true);
@@ -1217,9 +1214,6 @@ fn do_test_completed_payment_not_retryable_on_reload(use_dust: bool) {
12171214
nodes[1].node.peer_disconnected(node_a_id);
12181215

12191216
nodes[0].node.test_process_background_events();
1220-
check_added_monitors(&nodes[0], 1); // TODO: Removed in the next commit as this only required
1221-
// when we are still seeing all payments, even resolved
1222-
// ones.
12231217

12241218
reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[1]));
12251219

@@ -1238,7 +1232,8 @@ fn test_completed_payment_not_retryable_on_reload() {
12381232
}
12391233

12401234
fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
1241-
persist_manager_post_event: bool, confirm_commitment_tx: bool, payment_timeout: bool,
1235+
persist_manager_post_event: bool, persist_monitor_after_events: bool,
1236+
confirm_commitment_tx: bool, payment_timeout: bool,
12421237
) {
12431238
// When a Channel is closed, any outbound HTLCs which were relayed through it are simply
12441239
// dropped. From there, the ChannelManager relies on the ChannelMonitor having a copy of the
@@ -1338,36 +1333,58 @@ fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
13381333
node_a_ser = nodes[0].node.encode();
13391334
}
13401335

1341-
let mon_ser = get_monitor!(nodes[0], chan_id).encode();
1336+
let mut mon_ser = Vec::new();
1337+
if !persist_monitor_after_events {
1338+
mon_ser = get_monitor!(nodes[0], chan_id).encode();
1339+
}
13421340
if payment_timeout {
13431341
let conditions = PaymentFailedConditions::new().from_mon_update();
13441342
expect_payment_failed_conditions(&nodes[0], payment_hash, false, conditions);
13451343
} else {
13461344
expect_payment_sent(&nodes[0], payment_preimage, None, true, true);
13471345
}
1346+
// Note that if we persist the monitor before processing the events, above, we'll always get
1347+
// them replayed on restart no matter what
1348+
if persist_monitor_after_events {
1349+
mon_ser = get_monitor!(nodes[0], chan_id).encode();
1350+
}
13481351

13491352
// If we persist the ChannelManager after we get the PaymentSent event, we shouldn't get it
13501353
// twice.
13511354
if persist_manager_post_event {
13521355
node_a_ser = nodes[0].node.encode();
1356+
} else if persist_monitor_after_events {
1357+
// Persisting the monitor after the events (resulting in a new monitor being persisted) but
1358+
// didn't persist the manager will result in an FC, which we don't test here.
1359+
panic!();
13531360
}
13541361

13551362
// Now reload nodes[0]...
13561363
reload_node!(nodes[0], &node_a_ser, &[&mon_ser], persister, chain_monitor, node_a_reload);
13571364

13581365
check_added_monitors(&nodes[0], 0);
1359-
if persist_manager_post_event {
1366+
if persist_manager_post_event && persist_monitor_after_events {
13601367
assert!(nodes[0].node.get_and_clear_pending_events().is_empty());
1361-
check_added_monitors(&nodes[0], 2);
1368+
check_added_monitors(&nodes[0], 0);
13621369
} else if payment_timeout {
1363-
let conditions = PaymentFailedConditions::new().from_mon_update();
1370+
let mut conditions = PaymentFailedConditions::new();
1371+
if !persist_monitor_after_events {
1372+
conditions = conditions.from_mon_update();
1373+
}
13641374
expect_payment_failed_conditions(&nodes[0], payment_hash, false, conditions);
1375+
check_added_monitors(&nodes[0], 0);
13651376
} else {
1377+
if persist_manager_post_event {
1378+
assert!(nodes[0].node.get_and_clear_pending_events().is_empty());
1379+
} else {
1380+
expect_payment_sent(&nodes[0], payment_preimage, None, true, false);
1381+
}
13661382
// After reload, the ChannelManager identified the failed payment and queued up the
1367-
// PaymentSent and corresponding ChannelMonitorUpdate to mark the payment handled, but
1368-
// while processing the pending `MonitorEvent`s (which were not processed before the
1369-
// monitor was persisted) we will end up with a duplicate ChannelMonitorUpdate.
1370-
expect_payment_sent(&nodes[0], payment_preimage, None, true, false);
1383+
// PaymentSent (or not, if `persist_manager_post_event` resulted in us detecting we
1384+
// already did that) and corresponding ChannelMonitorUpdate to mark the payment
1385+
// handled, but while processing the pending `MonitorEvent`s (which were not processed
1386+
// before the monitor was persisted) we will end up with a duplicate
1387+
// ChannelMonitorUpdate.
13711388
check_added_monitors(&nodes[0], 2);
13721389
}
13731390

@@ -1381,12 +1398,15 @@ fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
13811398

13821399
#[test]
13831400
fn test_dup_htlc_onchain_doesnt_fail_on_reload() {
1384-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true);
1385-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, false);
1386-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, false);
1387-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, true, true);
1388-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, true, false);
1389-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, false);
1401+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true, true);
1402+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true, false);
1403+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, false, false);
1404+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, true, true);
1405+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, true, false);
1406+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, false, false);
1407+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, true, true);
1408+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, true, false);
1409+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, false, false);
13901410
}
13911411

13921412
#[test]
@@ -4140,15 +4160,9 @@ fn do_no_missing_sent_on_reload(persist_manager_with_payment: bool, at_midpoint:
41404160
let config = test_default_channel_config();
41414161
reload_node!(nodes[0], config, &node_ser, &[&mon_ser], persist_b, chain_monitor_b, node_a_2);
41424162

4143-
// Because the pending payment will currently stick around forever, we'll apply a
4144-
// ChannelMonitorUpdate on each startup to attempt to remove it.
4145-
// TODO: This will be dropped in the next commit after we actually remove the payment!
4146-
check_added_monitors(&nodes[0], 0);
41474163
nodes[0].node.test_process_background_events();
4148-
check_added_monitors(&nodes[0], 1);
41494164
let events = nodes[0].node.get_and_clear_pending_events();
41504165
assert!(events.is_empty());
4151-
check_added_monitors(&nodes[0], 0);
41524166

41534167
// Ensure that we don't generate any further events even after the channel-closing commitment
41544168
// transaction is confirmed on-chain.
@@ -4167,9 +4181,6 @@ fn do_no_missing_sent_on_reload(persist_manager_with_payment: bool, at_midpoint:
41674181
reload_node!(nodes[0], config, &node_ser, &[&mon_ser], persist_c, chain_monitor_c, node_a_3);
41684182
let events = nodes[0].node.get_and_clear_pending_events();
41694183
assert!(events.is_empty());
4170-
4171-
// TODO: This will be dropped in the next commit after we actually remove the payment!
4172-
check_added_monitors(&nodes[0], 1);
41734184
}
41744185

41754186
#[test]

0 commit comments

Comments
 (0)