- 
                Notifications
    You must be signed in to change notification settings 
- Fork 422
Fix SCID removal when an HTLC is pending from 0.1 before upgrading #4167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix SCID removal when an HTLC is pending from 0.1 before upgrading #4167
Conversation
If while a splice is pending, the channel happens to not have any commitment updates, but did prior to the splice being negotiated, it's possible that we end up with bogus holder HTLC data for the previous commitment. After the splice becomes locked, we've successfully transitioned to the new funding transaction, but that funding transaction never had a commitment transaction negotiated for the previous state.
| 👋 Thanks for assigning @jkczyz as a reviewer! | 
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #4167      +/-   ##
==========================================
+ Coverage   88.78%   88.80%   +0.01%     
==========================================
  Files         180      180              
  Lines      137004   137262     +258     
  Branches   137004   137262     +258     
==========================================
+ Hits       121642   121896     +254     
- Misses      12538    12561      +23     
+ Partials     2824     2805      -19     
 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
54881f5    to
    545854f      
    Compare
  
    | 🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. | 
        
          
                lightning/src/ln/channel.rs
              
                Outdated
          
        
      | // `position` above will also return `None` if we have historical scids but they all | ||
| // need to be removed, so `end` should point to the last index in such cases. | ||
| .unwrap_or(self.context.historical_scids.len()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be safe, could you move this up after position and change that and_then to a map? I think it should currently work as written, but arguably that change is more correct. Otherwise, we're assuming historical_scids is empty when the current funding doesn't have an SCID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the whole thing with filter+count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, that requires traversing the entire list where the previous approach only needs to traverse the ones to drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do that many on-chain splice transactions, I think you can afford the CPU :)
| 👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. | 
We relied on `position` giving us the last index we need to prune, but this may return `None` when all known legacy SCIDs need to be pruned. In such cases, we ended up not pruning any of the legacy SCIDs at all. Rewritten by: Matt Corallo <[email protected]>
Test tweaked by: Matt Corallo <[email protected]>
When splicing, we're required by protocol to retain all the existing keys material except the funding key which we're allowed to rotate. In the original implementation we acknowledged that but figured we'd stick with a single `pubkey` method in the `ChannelSigner` anyway cause adding a specific method for it is annoying. Sadly, this was ultimately broken - in `FundingScope::for_splice`, we called the signer's `new_pubkeys` method (renamed from `pubkeys` after splicing initially landed), replacing all of the public keys the `Channel` would use rather than just the funding key. This can result in commitment signature mismatches if the signer changes any keys aside from the funding one. `InMemorySigner` did not do so, however, so we didn't notice the bug. Luckily-ish, in 189b8ac we started generating a fresh `remote_key` when splicing (at least when upgrading from 0.1 to 0.2 or when setting `KeysManager` to use v1 `remote_key` derivation). This breaks splicing cause we can't communicate the new `remote_key` to the counterparty during the splicing handshake. Ultimately this bug is because the API we had didn't communicate to the signer that we weren't allowed to change anything except the funding key, and allowed returning a `ChannelPublicKeys` which would break the channel. Here we fix this by renaming `new_pubkeys` `pubkeys` again (partially reverting 9d291e0 but keeping the changed requirements that `pubkeys` only be called once) and adding a new `ChannelSigner:new_funding_pubkey` method specifically for splicing. We also update `channel.rs` to correctly fetch the new funding pubkey before sending `splice_init`, storing it in the `PendingFunding` untl we build a `FundingScope`.
545854f    to
    cf745d9      
    Compare
  
    cf745d9    to
    3160be1      
    Compare
  
    3160be1    to
    41d95a6      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failing :'(
        
          
                lightning/src/ln/channel.rs
              
                Outdated
          
        
      | // `position` above will also return `None` if we have historical scids but they all | ||
| // need to be removed, so `end` should point to the last index in such cases. | ||
| .unwrap_or(self.context.historical_scids.len()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, that requires traversing the entire list where the previous approach only needs to traverse the ones to drop.
In the previous commit we partially reverted 9d291e0 renaming `ChannelSigner::new_pubkeys` to `pubkeys` again, but we still don't want to go back to requiring that `pubkeys` return the same contents on each call. Thus, here, we add test logic to check that `pubkeys` isn't called more than once.
`build_commitment_transaction`'s fifth argument is supposed to be whether we're the one generating the commitment (i.e. because we're signing rather than validating the commitment). During splicing, this doesn't matter because there should be no async HTLC addition/removal happening so the commitment generated wil be the same in either case, but its still good to pass the correct bool.
If an HTLC was forwarded in 0.1, but waiting to be failed back, it will ultimately be failed by adding it to the `ChannelManager::pending_forwards` map with the channel's original SCID. If that channel is spliced between when the HTLC was forwarded (on 0.1) and when the HTLC is failed back (on 0.2), that SCID may no longer exist, causing the HTLC fail-back to be lost. Luckily, delaying when an SCID is expired is cheap - its just storing an extra `u64` or two and generating one requires an on-chain splice, so we simply delay removal of SCIDs for two months at which point any incoming HTLCs should have been expired for six weeks and the counterparty should have force-closed anyway.
41d95a6    to
    fb7ce22      
    Compare
  
    | 
 Oops, fixed one issue, but most of it is an MSRV issue on syn. | 
No description provided.