-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calc_forwarding_channel
can pick a channel that is offline, leading to unilateral closure
#7917
Comments
It absolutely seems the peer was connected. As your logs show, we were talking to it. We don't create a channeld for it until after it has connected to us. Look for 03…ee-connectd in the logs: I anticipate you should see it connecting... |
Of course, I had been connected to the peer previously, as I had an active channel with them. However, at the time when CLN added an HTLC to the channel, the peer was disconnected. I'll show you… For reference, here is an example of what I expect to happen when a forward comes in while the peer is disconnected:
And what follows is what actually happened. This is a grep for the peer's node ID with no lines filtered out or manually elided. You can see the last message exchange I had with them, wherein they updated the fee at 16:34:47 and then disconnected about ten minutes later at 16:44:33. My node attempted to reconnect to them every five minutes until eventually it added an HTLC to the channel at 03:59:50, even though it was disconnected at the time.
For completeness, here are all log lines that contain the string "connect" (not filtered to only those containing the node ID in question) from a little before the incident to a little after it:
|
Wow, neat. We have a phantom channeld (which should only be created once we have a connection). Connectd is still trying to connect, so it doesn't believe we're connected. So maybe channeld is hanging around but shouldn't be? connectd gives subds 5 seconds before closing on them, and channeld should shutdown once that happens, worst case. Hmm, OK, grep for '#255525': you should see channeld getting created, something like this from my own logs:
Also, just check for 'messages suppressed' in case that's dropping stuff. |
Here's everything from December 1-6:
Lots of "messages suppressed". That's supremely unhelpful. I would have thought that that misfeature would apply only to the in-memory log, which I actually have patched out of my CLN to save RAM. How can I tell CLN never to "suppress" any log messages? |
We ratelimited DEBUG messages, but that can be annoying and cause us to miss things. We demoted the worst offenders in the last release, to TRACE level. Now, only log trace if it's wanted, and never suppress DEBUG. Changelog-Changed: Logging: we no longer suppress DEBUG messages from subdaemons. Signed-off-by: Rusty Russell <[email protected]> Fixes: ElementsProject#7917
Issue and Steps to Reproduce
I had an open channel #255525 with peer
03…ee
, but the peer had been disconnected for some time. (The peer is Tor-only, and I don't have Tor connectivity, so there was no way for my node to connect outbound to the peer.)An HTLC came in over a different channel, #255563, with a different peer,
02…9d
, and evidently CLN "decided" to forward the payment over channel #255525 with the offline peer03…ee
:(How could
03…ee-channeld-chan#255525
possibly have sentWIRE_UPDATE_ADD_HTLC
to peer03…ee
while the peer was disconnected?)Eventually the downstream HTLC 172 hit its deadline, causing CLN to unilaterally close the downstream channel #255525:
CLN noticed that the deadlined downstream HTLC 172 was related to an upstream HTLC 5272 in channel #525563 and correctly failed that upstream HTLC, saving the upstream channel from also hitting its deadline. (Kudos! This used to not work.)
So, the question is: why did CLN add an HTLC to a channel with a disconnected peer? That's just asking to hit a deadline and to be forced to unilaterally close that channel, when that could have been avoided simply by refusing to add an HTLC to a disconnected channel. Moreover, this behavior causes a bad user experience because it creates a stuck payment that only unsticks when the HTLC in the disconnected channel eventually hits its deadline (or the downstream peer comes back online).
getinfo
outputThis happened with CLN v24.08.2.
The text was updated successfully, but these errors were encountered: