Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache x11 block header hash, reduce reindex hashes from 4->2 per block #6610

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

PastaPastaPasta
Copy link
Member

Issue being fixed or feature implemented

We currently hash the header up to 4 times per block we are loading from disk; drop this down to 2 times

before:
./src/dashd --printtoconsole --testnet --nowallet --reindex --stopatheight=1 76.34s user 6.15s system 95% cpu 1:25.98 total
image

after:

./src/dashd --printtoconsole --testnet --nowallet --reindex --stopatheight=1 62.87s user 5.70s system 95% cpu 1:11.52 total
image

Shaves ~20-25% off of header reindex times.
Calculated as ((76.34-62.87)/76.34)=17.6%, but this includes the overhead of startup / shutdown, so it's likely higher.

What was done?

How Has This Been Tested?

Reindex headers; should probably do a full reindex

Breaking Changes

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@PastaPastaPasta PastaPastaPasta added this to the 23 milestone Mar 6, 2025
Copy link

coderabbitai bot commented Mar 6, 2025

Walkthrough

The changes implement and propagate caching of computed block header hashes. A mutable variable, cached_hash, is added to the block header class to store the hash value once it is calculated. The GetHash method in the block header now checks if this cache is valid before performing any computation, thereby avoiding redundant calculations. Serialization of the block header is updated to reset the cached hash using modified serialization macros and the SetNull method. Additionally, when a block header is extracted from a block, the cached hash is transferred to maintain consistency. In the block validation process, the cached hash is set immediately after computing the block's hash during the external block loading routine. These modifications are purely functional changes aimed at optimizing the hash retrieval process without altering the interface or overall control flow of the system.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fc9a7b and 761577c.

📒 Files selected for processing (3)
  • src/primitives/block.cpp (1 hunks)
  • src/primitives/block.h (3 hunks)
  • src/validation.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/validation.cpp
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: x86_64-w64-mingw32 / Build depends
  • GitHub Check: x86_64-apple-darwin / Build depends
  • GitHub Check: x86_64-pc-linux-gnu / Build depends
  • GitHub Check: x86_64-pc-linux-gnu_nowallet / Build depends
  • GitHub Check: arm-linux-gnueabihf / Build depends
  • GitHub Check: x86_64-pc-linux-gnu_multiprocess / Build depends
🔇 Additional comments (6)
src/primitives/block.cpp (1)

15-21: Well-implemented hash caching optimization.

This is a good implementation of hash caching for the X11 block header. The method now checks if the hash has already been computed before performing the expensive calculation, which should significantly reduce redundant hashing operations during reindex operations.

The PR objective mentioned reducing hashes from 4 to 2 per block during reindexing, and this implementation achieves that goal by caching the result of HashX11 computations.

src/primitives/block.h (5)

34-35: Good addition of cached_hash with clear documentation.

The addition of the cached_hash member variable with the explanatory comment clearly indicates its purpose as a memory-only optimization for the expensive X11 hash operation.


42-45: Proper handling of serialization.

The updated SERIALIZE_METHODS correctly resets the cached hash during serialization, ensuring that any modification to the block header will trigger a fresh hash computation when needed.


55-55: Proper null state handling.

The SetNull() method now properly resets the cached hash, ensuring consistency when a block header is reset.


235-235: Good propagation of cached hash to extracted headers.

Copying the cached hash when extracting a block header from a block is an important optimization that avoids redundant hash calculations when working with both blocks and their headers.


13-236:

❓ Verification inconclusive

Verify thread safety for cached_hash access.

The implementation looks solid, but since cached_hash is declared as mutable and could be modified from const methods like GetHash(), please verify that access from multiple threads is safe. If multiple threads call GetHash() simultaneously on the same object, there could be race conditions when initializing the cache.


🏁 Script executed:

#!/bin/bash
# Search for places where GetHash is called from different threads
rg "GetHash" -A 3 -B 3 | grep -E "thread|Task|Worker|Pool|parallel|mutex|lock" -A 3 -B 3

Length of output: 69721


Action Required: Verify Thread Safety of Cached Hash in CBlockHeader

Our investigation did not reveal any explicit use of synchronization (e.g., mutexes or locks) around calls to GetHash() or modifications to cached_hash. Although the grep results from thread-related keywords did not pinpoint concurrent usages of GetHash(), this does not guarantee that concurrent access is safe.

Please verify that:

  • Instances of CBlockHeader are not accessed concurrently from multiple threads without external synchronization.
  • The design assumptions either restrict GetHash() to a single-threaded context or ensure that external locking is always applied.
  • If concurrent access is possible, consider implementing internal synchronization (for example, by using a std::mutex) to protect updates to cached_hash.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@UdjinM6
Copy link

UdjinM6 commented Mar 7, 2025

Nice! 👍

Could also avoid resetting cached_hash on writes and squeeze a bit more out of it:

diff --git a/src/primitives/block.h b/src/primitives/block.h
index 158c42b2c1..d80fb30215 100644
--- a/src/primitives/block.h
+++ b/src/primitives/block.h
@@ -41,7 +41,7 @@ public:
 
     SERIALIZE_METHODS(CBlockHeader, obj) {
         READWRITE(obj.nVersion, obj.hashPrevBlock, obj.hashMerkleRoot, obj.nTime, obj.nBits, obj.nNonce);
-        obj.cached_hash.SetNull();
+        SER_READ(obj, obj.cached_hash.SetNull());
     }
 
     void SetNull()

develop:
119.50s user 13.16s system 98% cpu 2:14.32 total

This PR:
101.09s user 12.94s system 97% cpu 1:56.42 total

This PR with the patch above:
89.55s user 12.55s system 97% cpu 1:44.35 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants