Attributable failures #2256

joostjager · 2023-05-02T13:18:46Z

This PR implements the update_fail_htlc optional attribution_data field as tlv type 101 rather than type 1 until interop with other node software is achieved.

Until that time, the feature bit doesn't need to be set either.

joostjager · 2023-05-02T13:20:59Z

Single hop interop testing passes.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

TheBlueMatt · 2023-05-02T20:15:12Z

Nice! That's not too bad, thanks for working on this. I'll dig into the crypto in a day or two. Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

joostjager · 2023-05-02T20:36:48Z

Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

This doesn't surprise me. This is my first venture into Rust land. It is a bit of a switch from golang, and I need to get used to how things are done. The language is not bad though, I can see why people fall in love with it! But yes, can hold off on code-level comments for now.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

Not sure if it is a known issue. I've had nagging gossip issues before when I tried to do a pathfinding benchmark. For this PR, I did apply a patch to rust-lightning somewhere to force-send node announcement always and then it worked better. Will try to come up with a reasonable repro scenario.

TheBlueMatt · 2023-05-02T20:55:23Z

What are you using to do the testing? I assume ldk-sample of some form? If you change the timer at https://github.com/lightningdevkit/ldk-sample/blob/main/src/main.rs#L791 it will rebroadcast a fresh node announcement more aggressively.

joostjager · 2023-05-02T20:58:19Z

Yes, ldk-sample. It was doing the timer alright, but somehow it got filtered out in the timer handler.

TheBlueMatt · 2023-05-02T20:59:23Z

Would be happy to take a look at logs. At the TRACE level we should basically be writing everything that is going out or coming in on the wire.

joostjager · 2023-05-02T21:09:31Z

Yes, so I did see that the announcement wasn't going out. Will continue tomorrow and get back with more data.

TheBlueMatt · 2023-05-02T21:41:12Z

Errr, right, so to avoid dumping tons of gossip crap that message is only really logged at a high level (Broadcasting NodeAnnouncement after passing it to our own RoutingMessageHandler.), with the remaining logs on a per-peer level being at the GOSSIP level (disabled by default cause its verbose) - Sending message to all peers except {:?} or the announced node: {:?} and Skipping broadcast message to {:?} as its outbound buffer is full

joostjager · 2023-05-03T07:43:39Z

Looks like timer is at 60s. I definitely waited much longer than that, so doesn't seem to be the problem. Enabled gossip logging and saved log files. Continuing in #2259

joostjager · 2023-05-03T20:00:45Z

With gossip fixed with the hint in #2259, I was able to run through a few of the multi-hop inter-op scenarios:
LND -> LDK -> LDK
LDK -> LDK -> LND
LND -> LDK (intermediate failure)

Obviously there are more, but I think this is a good enough sanity check for now.

As mentioned above, attention should go to the crypto part of this feature first.

joostjager · 2025-03-11T15:42:08Z

Pushed 2025 version of the code to this PR.

TheBlueMatt

One quick comment from a glance

TheBlueMatt · 2025-03-11T15:50:26Z

lightning-invoice/src/lib.rs

 	/// `InvoiceBuilder::build(self)` becomes available.
 	pub fn new(currency: Currency) -> Self {
+		let mut features = Bolt11InvoiceFeatures::empty();
+		features.set_attributable_failures_optional();


Generally we set features elsewhere, afaiu. Setting it here forces everyone using this to include the set features, but its possible someone builds an invoice using this logic that isn't terminating at a ChannelManager.

Is there a different place in LDK where this feature can be set, or is it left fully up to the caller to not forget?

The invoice builder mostly has setters that set individual flags called by invoice_utils/channelmanager::create_bolt11_invoice. eg basic_mpp.

Will leave it out of this PR except for the definition of the bits. Until interop is completed we don't need the feature bits.

joostjager · 2025-03-12T13:37:19Z

Ready for a first pass

TheBlueMatt

Note that the last commit's commit message is wrong. It says that "Signal to senders that the node will return attributable failures", but we don't actually do any signaling, we just define the feature bit.

TheBlueMatt · 2025-03-13T14:50:25Z

lightning/src/ln/channel.rs

 				},
 				skimmed_fee_msat: None,
 				blinding_point: None,
+				timestamp: None,


Do we want to store this, or would we simply use 0 for the timestamp after a restart?

It could work with 0 as the magic value for not available, but I think it's better to be explicit with an option?

TheBlueMatt · 2025-03-13T15:08:14Z

lightning/src/ln/onion_utils.rs

+	let mut writer = VecWriter(Vec::new());
+	failuremsg.write(&mut writer).unwrap();
+	pad.write(&mut writer).unwrap();
+	let encoded_msg = writer.0;


Ha, this is such a mess... I think this does all of the ser logic much cleaner.

let mut failuremsg = VecWriter(Vec::with_capacity(1024)); failuremsg.0.extend_from_slice(&[0; 32][..]); // leave room for the hmac (failure_data.len() as u16 + 2).write(&mut failuremsg).expect("Cannot fail to write to a vec"); failure_type.write(&mut failuremsg).expect("Cannot fail to write to a vec"); failuremsg.0.extend_from_slice(&failure_data[..]); if failuremsg.0.len() < 1024 { failuremsg.0.resize(1024, 0); } let data = failuremsg.0; let hmac = Hmac(&data[32..]); data[..32].copy_from_slice(&hmac);

This code isn't correct I think, because it doesn't write the pad length. Not sure if it is all that much more readable also. But I did take inspiration from it and removed one of the writers that I originally had.

One of the reasons to do it this way is it avoids three entire Vecs, including potentially quite a few allocations in the VecWriter. I'd strongly prefer to do it all in one vec if possible (in general, C/Rust applications suffer from memory fragmentation when they allocate a lot, leading to ~2x the total memory usage that they actually use. Without a GC no defragmentation can occur so we end up stuck with it...hence we try to be pretty careful about unnecessary allocations)

Added a refactor commit that does it all in a single pre-allocated vec.

TheBlueMatt · 2025-03-13T15:13:23Z

lightning/src/ln/onion_utils.rs

 		let um = gen_um_from_shared_secret(shared_secret.as_ref());
+
+		// Check attr error hmacs if present.
+		if let Some(ref attribution_data) = encrypted_packet.attribution_data {


This looks like a good opportunity to add a fuzzer directly on process_onion_failure.

Discussed offline, leaving it outside the scope of this PR. This new data block indeed seems like a good candidate for fuzzing.

TheBlueMatt · 2025-03-13T15:22:50Z

lightning/src/ln/channel.rs

 			return Ok(None);
 		}

+		let timestamp = duration_since_epoch();


Can you add a comment here describing why we're picking timestamp when adding the HTLC here? Presumably something about how we'd prefer to blame the downstream counterparty as much as possible, but that doing so thoroughly requires passing timestamps around from the inbound channel so we don't bother.

Added comment describing that it doesn't matter all that much what measuring points we pick.

I'm not convinced that we should blame the downstream as much as possible. Because if our direct upstream peer is the sender, we're unnecessarily blaming ourselves.

TheBlueMatt · 2025-03-13T15:23:14Z

lightning/src/ln/channel.rs

 	source: HTLCSource,
 	blinding_point: Option<PublicKey>,
 	skimmed_fee_msat: Option<u64>,
+	timestamp: Option<Duration>,


nit: Should we call this like "queued_timestamp" or so?

I've made it send_timestamp, in line with the setting of it in send_htlc.

lightning/src/ln/onion_utils.rs

carlaKC

LGTM, hitting the limits of my context in the codebase so my remaining comments are nitty/non-blocking.

Only thing left for me to review thoroughly are the tests in the last commit.

lightning/src/ln/onion_utils.rs

lightning/src/ln/channel.rs

carlaKC · 2025-03-31T18:44:37Z

lightning/src/ln/onion_utils.rs

+	shared_secret: &[u8], failure_type: u16, failure_data: &[u8], min_packet_len: usize,
 ) -> OnionErrorPacket {
 	assert_eq!(shared_secret.len(), 32);
-	assert!(failure_data.len() <= 256 - 2);


Rather than remove this completely, assert that it's less than u16::MAX -2? In theory (failure_len as u16) below could overflow.

Brought it back for the actual max failure_data length that brings the total message size to 65535, the spec max. Added a test for that too.

lightning/src/ln/onion_utils.rs

carlaKC · 2025-03-31T19:03:14Z

lightning/src/ln/onion_utils.rs

 	hmacs
 });

+impl AttributionData {


Really nice comments here 🤩

Thank you! This is what we were used to 😉

TheBlueMatt · 2025-03-31T21:07:45Z

lightning/src/ln/onion_utils.rs

+
+	/// Shifts hold times and HMACs to the left, taking into account HMAC pruning. This is the inverse operation of what
+	/// hops do when back-propagating the failure.
+	fn shift_left(&mut self) {


I mean, this is fine, I guess, but it seems pretty redundant to shift the HMACs left every time we verify if we already pass position to verify and it already looks at the HMACs its HMACing based on the position, we might as well just...look at the HMACs we're HMACing based on the position?

I had this very same thought, but not shifting also means that chacha needs to be performed in a complicated way, skipping the slots that would otherwise have been pruned. It seems simpler to just do the inverse operation, but open to suggestions.

Hmm, yea, that seems not worth it, nevermind.

TheBlueMatt · 2025-04-01T01:13:57Z

Otherwise largely LGTM

Easier to verifier that the correct key is used.

This commit does not yet introduce attribution data, but just adds the field and required serialization logic to the relevant types. Co-authored-by: Matt Corallo <[email protected]>

Record a timestamp when the htlc is sent out and record the hold duration alongside the failure reason.

Improve efficiency by not utilizing multiple vecs.

Prepares for new test vectors that pad to 1024 bytes.

TheBlueMatt

LGTM

TheBlueMatt · 2025-04-02T15:38:36Z

lightning/src/ln/channel.rs

+		// just shift sender-applied penalties between our incoming and outgoing side. So we choose measuring points
+		// that are simple to implement, and we do it on the outgoing side because then the failure message that encodes
+		// the hold time still needs to be built in channel manager.
+		let send_timestamp = duration_since_epoch();


I think it'd be nice to set this when we push into the holding cell as well, since in practice something getting stuck in the holding cell may be the fault of our peer.

When it gets out of the holding cell, I think it will again get to this point. The reported hold time will be shorter because the timestamp is later, but I don't think it is a problem because of the reason mentioned in the comment. Definitely good enough for this stage I think.

fuzz/src/process_onion_failure.rs

carlaKC

Nice tests, LGTM!

lightning/src/ln/onion_utils.rs

TheBlueMatt

CI is sad:

  INFO [lightning::ln::onion_utils:1417] Onion Error[from 02edabbd16b41c8371b92ef2f04c1185b4f03b6dcd52ba9b78d9d7c89c8f221145: incorrect_or_unknown_payment_details(0x400f)] The final node indicated the payment hash is unknown or amount is incorrect
thread 'ln::onion_utils::tests::test_attributable_failure_packet_onion_mutations' panicked at 'assertion failed: decrypted_failure.short_channel_id == Some(mutating_node as u64)', lightning/src/ln/onion_utils.rs:2769:17
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
   3: lightning::ln::onion_utils::tests::test_attributable_failure_packet_onion_mutations
             at ./src/ln/onion_utils.rs:2769:5
   4: lightning::ln::onion_utils::tests::test_attributable_failure_packet_onion_mutations::{{closure}}
             at ./src/ln/onion_utils.rs:2753:2
   5: core::ops::function::FnOnce::call_once
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    ln::onion_utils::tests::test_attributable_failure_packet_onion_mutations

This commit extends the generation, forwarding and interpretation of htlc fail messages with attribution data. This allows senders to identify the failing node even when this node does not want to be identified and is generating a failure message without a valid hmac. For more information, see the bolt spec.

joostjager · 2025-04-03T06:10:00Z

Oops, had to be 0x4000 | 15 for the failure code. Fixed.

joostjager mentioned this pull request May 2, 2023

htlcswitch: attributable errors lightningnetwork/lnd#7139

Open

joostjager force-pushed the attr-errs branch from 555918d to e4ff468 Compare May 3, 2023 19:56

joostjager mentioned this pull request May 15, 2023

Lightning Dev Summit Topics lightning/bolts#1078

Closed

joostjager mentioned this pull request Feb 21, 2025

Attributable errors 2025 #3611

Closed

joostjager changed the title ~~Convert to attributable errors~~ Attributable failures Mar 11, 2025

joostjager force-pushed the attr-errs branch from e4ff468 to e7923e9 Compare March 11, 2025 15:39

joostjager marked this pull request as ready for review March 11, 2025 15:41

TheBlueMatt reviewed Mar 11, 2025

View reviewed changes

joostjager force-pushed the attr-errs branch from e7923e9 to cfd962a Compare March 11, 2025 16:07

joostjager marked this pull request as draft March 11, 2025 16:37

joostjager force-pushed the attr-errs branch 4 times, most recently from 3b957d4 to 369fe76 Compare March 12, 2025 13:25

joostjager requested a review from arik-so March 12, 2025 13:36

joostjager force-pushed the attr-errs branch from 369fe76 to cc0a4c0 Compare March 13, 2025 09:25

TheBlueMatt reviewed Mar 13, 2025

View reviewed changes

joostjager commented Mar 13, 2025

View reviewed changes

lightning/src/ln/onion_utils.rs Outdated Show resolved Hide resolved

carlaKC reviewed Mar 31, 2025

View reviewed changes

TheBlueMatt reviewed Apr 1, 2025

View reviewed changes

joostjager and others added 5 commits April 1, 2025 10:57

Use readable hmac keys

f2cdef9

Easier to verifier that the correct key is used.

Add attribution_data field to data structures

3fa33e9

This commit does not yet introduce attribution data, but just adds the field and required serialization logic to the relevant types. Co-authored-by: Matt Corallo <[email protected]>

Track htlc hold time

f56a3b9

Record a timestamp when the htlc is sent out and record the hold duration alongside the failure reason.

Use a single pre-allocated vec to build the failure packet.

6ee0ebb

Improve efficiency by not utilizing multiple vecs.

Parameterize failure packet length

fe4daa5

Prepares for new test vectors that pad to 1024 bytes.

joostjager force-pushed the attr-errs branch from b76944c to e135632 Compare April 1, 2025 09:46

joostjager requested review from TheBlueMatt and carlaKC April 1, 2025 17:59

TheBlueMatt previously approved these changes Apr 2, 2025

View reviewed changes

carlaKC approved these changes Apr 2, 2025

View reviewed changes

lightning/src/ln/onion_utils.rs Outdated Show resolved Hide resolved

joostjager dismissed TheBlueMatt’s stale review via e172ef2 April 2, 2025 22:01

joostjager force-pushed the attr-errs branch from e135632 to e172ef2 Compare April 2, 2025 22:01

joostjager requested review from TheBlueMatt and carlaKC April 2, 2025 22:14

TheBlueMatt reviewed Apr 2, 2025

View reviewed changes

joostjager force-pushed the attr-errs branch from e172ef2 to 07d4336 Compare April 3, 2025 06:08

joostjager requested a review from TheBlueMatt April 3, 2025 06:09

TheBlueMatt approved these changes Apr 3, 2025

View reviewed changes

joostjager merged commit f9b8d63 into lightningdevkit:main Apr 3, 2025
25 of 27 checks passed

carlaKC mentioned this pull request Apr 4, 2025

Fix long route failure attribution #3709

Merged

joostjager added the Optech Make Me Famous label Apr 8, 2025

joostjager mentioned this pull request May 27, 2025

Hold times for successful payments #3801

Merged

GeorgeTsagk mentioned this pull request Jun 2, 2025

Attributable failures lightningnetwork/lnd#9888

Open

2 tasks

joostjager mentioned this pull request Jun 3, 2025

Attribution data (feature 36/37) lightning/bolts#1044

Open

joostjager mentioned this pull request Jul 3, 2025

Attribution Data and HTLC Hold Times #3908

Open

8 tasks

Attributable failures #2256

Attributable failures #2256

Uh oh!

Conversation

joostjager commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented May 2, 2023

Uh oh!

TheBlueMatt commented May 2, 2023

Uh oh!

joostjager commented May 2, 2023

Uh oh!

TheBlueMatt commented May 2, 2023

Uh oh!

joostjager commented May 2, 2023

Uh oh!

TheBlueMatt commented May 2, 2023

Uh oh!

joostjager commented May 2, 2023

Uh oh!

TheBlueMatt commented May 2, 2023

Uh oh!

joostjager commented May 3, 2023

Uh oh!

joostjager commented May 3, 2023

Uh oh!

joostjager commented Mar 11, 2025

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager commented Mar 12, 2025

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlaKC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joostjager commented May 2, 2023 •

edited

Loading

joostjager Mar 14, 2025 •

edited

Loading

TheBlueMatt Mar 14, 2025 •

edited

Loading

joostjager Mar 14, 2025 •

edited

Loading

joostjager commented Apr 3, 2025 •

edited

Loading