Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce O(n) canonicalization algorithm #1670

Merged
merged 16 commits into from
Dec 11, 2024

Conversation

evanlinjin
Copy link
Member

@evanlinjin evanlinjin commented Nov 3, 2024

Fixes #1665
Replaces #1659

Description

Previously, getting the canonical history of transactions/UTXOs required calling TxGraph::get_chain_position on each transaction. This was highly inefficient and resulted in an O(n^2) algorithm. The situation is especially problematic when we have many unconfirmed conflicts.

This PR introduces an O(n) algorithm to determine the canonical set of transactions in TxGraph. The algorithm's premise is as follows:

  1. If transaction A is determined to be canonical, all of A's ancestors must also be canonical.
  2. If transaction B is determined to be NOT canonical, all of B's descendants must also be NOT canonical.
  3. If a transaction is anchored in the best chain, it is canonical.
  4. If a transaction conflicts with a canonical transaction, it is NOT canonical.
  5. A transaction with a higher last-seen has precedence.
  6. Last-seen values are transitive. A transaction's collective last-seen value is the max of it's last-seen value and all of it's descendants.

We maintain two mutually-exclusive txid sets: canoncial and not_canonical.

Imagine a method mark_canonical(A) that is based on premise 1 and 2. This method will mark transaction A and all of it's ancestors as canonical. For each transaction that is marked canonical, we can iterate all of it's conflicts and mark those as non_canonical. If a transaction already exists in canoncial or not_canonical, we can break early, avoiding duplicate work.

This algorithm iterates transactions in 3 runs.

  1. Iterate over all transactions with anchors in descending anchor-height order. For any transaction that has an anchor pointing to the best chain, we call mark_canonical on it. We iterate in descending-height order to reduce the number of anchors we need to check against the ChainOracle (premise 1). The purpose of this run is to populate non_canonical with all transactions that directly conflict with anchored transactions and populate canonical with all anchored transactions and ancestors of anchors transactions (transitive anchors).
  2. Iterate over all transactions with last-seen values, in descending last-seen order. We can call mark_canonical on all of these that do not already exist in canonical or not_canonical.
  3. Iterate over remaining transactions that contains anchors (but not in the best chain) and have no last-seen value. We treat these transactions in the same way as we do in run 2.

Benchmarks

Thank you to @ValuedMammal for working on this.

$ cargo bench -p bdk_chain --bench canonicalization

Benchmark results (this PR):

many_conflicting_unconfirmed::list_canonical_txs                                                                            
                        time:   [709.46 us 710.36 us 711.35 us]
many_conflicting_unconfirmed::filter_chain_txouts                                                                            
                        time:   [712.59 us 713.23 us 713.90 us]
many_conflicting_unconfirmed::filter_chain_unspents                                                                            
                        time:   [709.95 us 711.16 us 712.45 us]
many_chained_unconfirmed::list_canonical_txs                                                                             
                        time:   [2.2604 ms 2.2641 ms 2.2680 ms]
many_chained_unconfirmed::filter_chain_txouts                                                                             
                        time:   [3.5763 ms 3.5869 ms 3.5979 ms]
many_chained_unconfirmed::filter_chain_unspents                                                                             
                        time:   [3.5540 ms 3.5596 ms 3.5652 ms]
nested_conflicts_unconfirmed::list_canonical_txs                                                                            
                        time:   [660.06 us 661.75 us 663.60 us]
nested_conflicts_unconfirmed::filter_chain_txouts                                                                            
                        time:   [650.15 us 651.36 us 652.71 us]
nested_conflicts_unconfirmed::filter_chain_unspents                                                                            
                        time:   [658.37 us 661.54 us 664.81 us]

Benchmark results (master): https://github.com/evanlinjin/bdk/tree/fix/1665-master-bench

many_conflicting_unconfirmed::list_canonical_txs                                                                             
                        time:   [94.618 ms 94.966 ms 95.338 ms]
many_conflicting_unconfirmed::filter_chain_txouts                                                                             
                        time:   [159.31 ms 159.76 ms 160.22 ms]
many_conflicting_unconfirmed::filter_chain_unspents                                                                             
                        time:   [163.29 ms 163.61 ms 163.96 ms]

# I gave up running the rest of the benchmarks since they were taking too long.

Notes to the reviewers

  • PLEASE MERGE feat(chain,wallet)!: Transitive ChainPosition #1733 BEFORE THIS PR! We had to change the signature of ChainPosition to account for transitive anchors and unconfirmed transactions with no last-seen value.

  • The canonicalization algorithm is contained in /crates/chain/src/canonical_iter.rs.

  • Since the algorithm requires traversing transactions ordered by anchor height, and then last-seen values, we introduce two index fields in TxGraph; txs_by_anchor and txs_by_last_seen. Methods insert_anchor and insert_seen_at are changed to populate these index fields.

  • An ADR is added: docs/adr/0003_canonicalization_algorithm.md. This is based on the work in Architectural Decision Records #1592.

Changelog notice

  • Added: Introduce an O(n) canonicalization algorithm. This logic is contained in /crates/chain/src/canonical_iter.rs.
  • Added: Indexing fields in TxGraph; txs_by_anchor_height and txs_by_last_seen. Pre-indexing allows us to construct the canonical history more efficiently.
  • Removed: TxGraph methods: try_get_chain_position and get_chain_position. This is superseded by the new canonicalization algorithm.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@evanlinjin evanlinjin added the bug Something isn't working label Nov 3, 2024
@evanlinjin evanlinjin added this to the 1.0.0-beta milestone Nov 3, 2024
@evanlinjin evanlinjin self-assigned this Nov 3, 2024
crates/chain/src/tx_graph.rs Outdated Show resolved Hide resolved
crates/chain/src/tx_graph.rs Show resolved Hide resolved
crates/wallet/src/wallet/mod.rs Outdated Show resolved Hide resolved
@notmandatory
Copy link
Member

Per call today @evanlinjin will open a new PR for the 1.0.0-beta milestone that only makes expected breaking changes for Wallet, likely only the ChainPosition enum. Then the algorithm improvements in this PR can be released in a future non-breaking patch version.

@notmandatory notmandatory removed this from the 1.0.0-beta milestone Nov 19, 2024
@notmandatory notmandatory added the audit Suggested as result of external code audit label Nov 19, 2024
@evanlinjin evanlinjin force-pushed the fix/1665 branch 8 times, most recently from 5278b81 to 7610b65 Compare November 19, 2024 06:44
@notmandatory notmandatory added this to the 1.0.0-beta milestone Nov 21, 2024
@notmandatory
Copy link
Member

On further discussion today at release planning call this PR was moved back into the 1.0 milestone if it can be completed and reviewed in time. If not it will have to wait for a 2.0 milestone because it required breaking changes to chain crate APIs that are exposed in the Wallet API.

Alternatively we could remove the following Wallet functions and move any error types from chain into the core module which should insulate the wallet crate from this chain crate API changes.

    /// Get a reference to the inner [`TxGraph`].
    pub fn tx_graph(&self) -> &TxGraph<ConfirmationBlockTime> {
        self.indexed_graph.graph()
    }

    /// Get a reference to the inner [`KeychainTxOutIndex`].
    pub fn spk_index(&self) -> &KeychainTxOutIndex<KeychainKind> {
        &self.indexed_graph.index
    }

    /// Get a reference to the inner [`LocalChain`].
    pub fn local_chain(&self) -> &LocalChain {
        &self.chain
    }

@notmandatory notmandatory added the api A breaking API change label Nov 21, 2024
@evanlinjin
Copy link
Member Author

@notmandatory how does this solution compare with just giving wallet a major version bump?

@notmandatory
Copy link
Member

Removing the above functions and moving required chain error type to core should allow us to do breaking chain crate api changes without having to do a major wallet crate release. I also don't see why Wallet users need to access the inner chain types directly instead of using higher level functions like transactions(), or we can add new helper functions without breaking any APIs.

But all that said if it's less risky to just keep everything as is and do a 2.0 release in 6 mo or so I'd be fine with that too.

@evanlinjin evanlinjin force-pushed the fix/1665 branch 2 times, most recently from 78c9b0f to 64733ca Compare November 26, 2024 01:43
@evanlinjin evanlinjin changed the title Fixes #1665: Introduce O(n) canonicalization algorithm Nov 26, 2024
Add `run_until_finished` methods for `TxAncestors` and `TxDescendants`.
This is useful for traversing until the internal closure returns `None`.

Signatures of `TxAncestors` and `TxDescendants` are changed to enforce
generic bounds in the type definition.
evanlinjin and others added 10 commits December 10, 2024 21:39
This is an O(n) algorithm to determine the canonical set of txids.

* Run 1: Iterate txs with anchors, starting from highest anchor height
  txs.
* Run 2: Iterate txs with last-seen values, starting from highest
  last-seen values.
* Run 3: Iterate txs that are remaining from run 1 which are not
  anchored in the best chain.

Since all transitively-anchored txs are added to the `canonical` set in
run 1, and anything that conflicts to anchored txs are already added to
`not_canonial`, we can guarantee that run 2 will not traverse anything
that directly or indirectly conflicts anything that is anchored.

Run 3 is needed in case a tx does not have a last-seen value, but is
seen in a conflicting chain.

`TxGraph` is updated to include indexes `txids_by_anchor_height` and
`txids_by_last_seen`. These are populated by the `insert_anchor` and
`insert_seen_at` methods. Generic constaints needed to be tightened as
these methods need to be aware of the anchor height to create
`LastSeenIn`.
This is mostly taken from bitcoindevkit#1735 except we inline many of the functions
and test `list_canonical_txs`, `filter_chain_unspents` and
`filter_chain_txouts` on all scenarios.

CI and README is updated to pin `csv`.

Co-authored-by: valued mammal <[email protected]>
Also removed extra derives on `ObservedIn` and updated docs for
`CanonicalTx`.
Tx anchored in orphaned block and not seen in the mempool should be
canon.
evanlinjin and others added 4 commits December 10, 2024 21:56
In `Wallet::preselect_utxos()`, the code used to obtain chain position
of the UTXO's transaction from the graph, however the chain position
is already recorded within the UTXO's representation (`LocalOutput`).
This patch reuses the existing chain position instead of obtaining a
fresh one.
Copy link
Contributor

@ValuedMammal ValuedMammal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 956d0a9

Copy link
Contributor

@nymius nymius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 956d0a9

Copy link
Contributor

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 956d0a9

Copy link
Contributor

@jirijakes jirijakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 956d0a9

🔥

@evanlinjin evanlinjin merged commit 955593c into bitcoindevkit:master Dec 11, 2024
21 checks passed
@notmandatory
Copy link
Member

Thanks @evanlinjin and @ValuedMammal for all the work on getting this coded, documented, and benchmark tested, and to everyone who helped with review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api A breaking API change audit Suggested as result of external code audit bug Something isn't working module-blockchain
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Performance issue with get_chain_position and associated methods for unconfirmed transactions
8 participants