-
Notifications
You must be signed in to change notification settings - Fork 415
Let BackgroundProcessor
drive HTLC forwarding
#3891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Let BackgroundProcessor
drive HTLC forwarding
#3891
Conversation
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
Does this in any way limit users to not have delays or not have batching? Assuming that's what they want. |
On the contrary actually: it effectively reduces the (mean and min forwarding) delay quite a bit, which we can allow as we're gonna add larger receiver-side delays in the next step. And, while it get's rid of the event, users are still free to call |
ceb3335
to
9ba691c
Compare
Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable? |
LDK always processes HTLCs in batches (note that |
Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too? |
The event never featured any information so is not helpful for debugging or 'informational' purposes. Plus, it means at least 1-2 more rounds of |
But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved. Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?
I may have missed this deciding moment. If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests? |
Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.
I don't think this is generally the case, no. The 'assertion' that is mainly dropped is 'we generated an event', every thing else remains the same. |
9ba691c
to
b38c19e
Compare
This doesn't rule out a notification when there's something to forward, to at least not keep spinning when there's nothing to do? |
c1a0b35
to
d35c944
Compare
d35c944
to
c21aeab
Compare
Finished for now with the test refactoring post-dropping |
✅ Added second reviewer: @valentinewallace |
@@ -360,12 +376,24 @@ macro_rules! define_run_body { | |||
break; | |||
} | |||
|
|||
if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) { | |||
$channel_manager.get_cm().process_pending_htlc_forwards(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked a bit closer at this function. There is a lot of logic in there. Also various locks obtained.
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
🔔 2nd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
1ad6ce7
to
e088025
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3891 +/- ##
==========================================
- Coverage 88.82% 88.78% -0.05%
==========================================
Files 165 166 +1
Lines 119075 119576 +501
Branches 119075 119576 +501
==========================================
+ Hits 105769 106165 +396
- Misses 10986 11099 +113
+ Partials 2320 2312 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7920f35
to
8a67f2a
Compare
lightning/src/ln/channelmanager.rs
Outdated
@@ -6337,6 +6337,14 @@ where | |||
/// | |||
/// Will regularly be called by the background processor. | |||
pub fn process_pending_htlc_forwards(&self) { | |||
static REENTRANCY_GUARD: AtomicBool = AtomicBool::new(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this happen, another round of processing still underway? Also wondering if processing can be skipped accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this happen, another round of processing still underway?
Yes, for example if users would manually call process_pending_htlc_forwards
in addition to the background proceessor.
Also wondering if processing can be skipped accidentally.
No? How would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No? How would this happen?
process_pending_htlc_forwards
is executing and while that's happening, new forwards arrive. Then concurrently another call to process_pending_htlc_forwards
is initiated, which becomes a silent noop. At that point there are forwards that haven't been processed and must wait until the next round - if it comes - depending on implementation.
Also there can be a race condition between the persistence guard and the atomic bool I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process_pending_htlc_forwards
is executing and while that's happening, new forwards arrive. Then concurrently another call toprocess_pending_htlc_forwards
is initiated, which becomes a silent noop. At that point there are forwards that haven't been processed and must wait until the next round - if it comes - depending on implementation.
That's not an accidental skip though, they will be processed as part of the next batch.
Also there can be a race condition between the persistence guard and the atomic bool I think?
What do you mean by that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not an accidental skip though, they will be processed as part of the next batch.
Not everyone may use our background processor. I think it is the expectation that process_pending_htlc_forwards
does what it says? Or otherwise return an error perhaps.
What do you mean by that?
The persistence lock is released and then the atomic bool is reset. In between, another forward may come in, that is then not processed because the atomic bool is still set. The same argument basically, that it isn't fully safe, unless you keep retrying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The persistence lock is released and then the atomic bool is reset.
No, the persistence guard will be dropped and trigger persistence at the end of the scope, i.e., first the atomic bool is set to false, then we'll trigger persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see indeed. But is it really necessary to have the reentry guard? If users call this a bit more often, the only thing that would happen is that they may have to wait for the lock. And waiting for the lock may happen much more often anyway, because we are now polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see indeed. But is it really necessary to have the reentry guard? If users call this a bit more often, the only thing that would happen is that they may have to wait for the lock. And waiting for the lock may happen much more often anyway, because we are now polling.
I think it's safer to have it, as otherwise individual calls might get stacked all waiting on the locks, which might get more and more congested if they end up calling in faster than we can process.
@@ -360,12 +376,24 @@ macro_rules! define_run_body { | |||
break; | |||
} | |||
|
|||
if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) { | |||
$channel_manager.get_cm().process_pending_htlc_forwards(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering whether this function needs to be called twice per htlc (add and fail/settle), and hits the delay twice? And whether it is also required to be called at the sender and at the receiver node? (I know in lnd that the abstraction was so that for example on the final hop, it would 'forward' to the invoice)
|
||
use core::time::Duration; | ||
|
||
pub(crate) struct BatchDelay { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this and the below be pub(super)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given they are at the first hierarchy level, it's the same thing?
// log_normal_data <- round(rlnorm(n, meanlog = meanlog, sdlog = sdlog)) | ||
// cat(log_normal_data, file = "log_normal_data.txt", sep = ", ") | ||
// ``` | ||
const FWD_DELAYS_MILLIS: [u16; 10000] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm still not convinced this achieves much for the AS level attacker. The Revelio paper states "...the adversary can perfectly group payments (i.e., maintaining the success rate of 100%) with a per-
channel transaction rate of up to 0.33 tx/s. ... While this transaction rate may look small at first, it is in fact, 4-orders of magnitude larger than the estimated average transaction rate in current LN (i.e., 0.000019 tx/s per channel)." It also doesn't mention forwarding delays as a potential mitigation in the "Countermeasures" section, though I can see why that's somewhat intuitive.
I guess for me, for the AS threat model it all seems a bit security theater until we have constant bandwidth, basically? If this will be reused for receiver-side delays, it probably isn't worth holding up the PR over it though.
Previously, all `TIMER` constants were `u64` implictly assumed to represent seconds. Here, we switch them over to be `Duration`s, which allows for the introduction of sub-second timers. Moreover, it avoids any future confusions due to the implicitly assumed units.
Previously, we'd require the user to manually call `process_pending_htlc_forwards` as part of `PendingHTLCsForwardable` event handling. Here, we rather move this responsibility to `BackgroundProcessor`, which simplyfies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations. Note this also introduces batching rounds rather than calling `process_pending_htlc_forwards` individually for each `PendingHTLCsForwardable` event, which had been unintuitive anyways, as subsequent `PendingHTLCsForwardable` could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, as `process_pending_htlc_forwards` would of course handle all pending HTLCs at once.
Now that we have `BackgroundProcessor` drive the batch forwarding of HTLCs, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms.
We move the code into the `optionally_notify` closure, but maintain the behavior for now. In the next step, we'll use this to make sure we only repersist when necessary. Best reviewed via `git diff --ignore-all-space`
We skip repersisting `ChannelManager` when nothing is actually processed.
We add a reenatrancy guard to disallow entering `process_pending_htlc_forwards` multiple times. This makes sure that we'd skip any additional processing calls if a prior round/batch of processing is still underway.
a8c1d66
to
eb83451
Compare
Closes #3768.
Previously, we'd require the user to manually call
process_pending_htlc_forwards
as part ofPendingHTLCsForwardable
event handling. Here, we rather move this responsibility toBackgroundProcessor
, which simplifies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations.Note this also introduces batching rounds rather than calling
process_pending_htlc_forwards
individually for eachPendingHTLCsForwardable
event, which had been unintuitive anyways, as subsequentPendingHTLCsForwardable
could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, asprocess_pending_htlc_forwards
would of course handle all pending HTLCs at once.To this end, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms and drop the
PendingHTLCsForwardable
event.Draft for now as I'm still cleaning up the code base as part of the final commit droppingPendingHTLCsForwardable
.