Skip to content

hashdb cap and commit batch size config options #452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

magicxyyz
Copy link
Contributor

@magicxyyz magicxyyz commented May 7, 2025

This PR:

  • adds two new config options to core.CacheConfig :
    • TrieCapBatchSize - write batch size threshold used during capping triedb size
    • TrieCommitBatchSize - write batch size threshold used during committing triedb to disk
  • adds internal pebble batch size checks in ethdb/pebble.batch.Put and ethdb/pebble.batch.Delete to return an error instead of getting panic from the internals of pebble

part of NIT-3204

@magicxyyz magicxyyz self-assigned this May 23, 2025
@magicxyyz magicxyyz marked this pull request as ready for review May 27, 2025 21:01
@magicxyyz magicxyyz removed their assignment May 27, 2025
@magicxyyz magicxyyz marked this pull request as draft May 27, 2025 21:30
@magicxyyz magicxyyz marked this pull request as ready for review May 28, 2025 12:58
Comment on lines +56 to +64
// The max batch size is limited by the uint32 offsets stored in
// internal/batchskl.node, DeferredBatchOp, and flushableBatchEntry.
//
// Pebble limits the size to MaxUint32 (just short of 4GB) so that the exclusive
// end of an allocation fits in uint32.
//
// On 32-bit systems, slices are naturally limited to MaxInt (just short of
// 2GB).
// see: cockroachdb/pebble.maxBatchSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was copied and pasted from cockroachdb/pebble repo.
It is a bit confusing here though, e.g., DeferredBatchOp is defined in cockroachdb/pebble, and geth doesn't reference it directly.
I had a "hard time" to find about this.
I didn't find internal/batchskl.node even in cockroachdb/pebble.

// On 32-bit systems, slices are naturally limited to MaxInt (just short of
// 2GB).
// see: cockroachdb/pebble.maxBatchSize
maxBatchSize = (1<<31)<<(^uint(0)>>63) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was inspired from cockroachdb/pebble, but it is quite difficult to understand why it is defined the way it is here.

How about adding the comment that cockroachdb/pebble has here:

         // MaxUint32OrInt returns 
         // on eIf64Bit is 1 on 64-bit platforms and 0 on 32-bit platforms.
	oneIf64Bit = ^uint(0) >> 63

	// MaxUint32OrInt returns min(MaxUint32, MaxInt), i.e
	// - MaxUint32 on 64-bit platforms;
	// - MaxInt on 32-bit platforms.
	// It is used when slices are limited to Uint32 on 64-bit platforms (the
	// length limit for slices is naturally MaxInt on 32-bit platforms).
	MaxUint32OrInt = (1<<31)<<oneIf64Bit - 1

And also explicitly comment that it was inspired by cockroachdb/pebble

func (b *batch) Put(key, value []byte) error {
// The size increase is argument in call to cockroachdb/pebble.Batch.grow in cockroachdb/pebble.Batch.prepareDeferredKeyValueRecord. pebble.Batch.grow may panic if the batch data size plus the increase reaches cockroachdb/pebble.maxBatchSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The size increase is argument in call to cockroachdb/pebble.Batch.grow in cockroachdb/pebble.Batch.prepareDeferredKeyValueRecord. pebble.Batch.grow may panic if the batch data size plus the increase reaches cockroachdb/pebble.maxBatchSize
// The size increase is an argument to the cockroachdb/pebble.Batch.grow call in cockroachdb/pebble.Batch.prepareDeferredKeyValueRecord. pebble.Batch.grow may panic if the batch data size plus the increase reaches cockroachdb/pebble.maxBatchSize

func (b *batch) Put(key, value []byte) error {
// The size increase is argument in call to cockroachdb/pebble.Batch.grow in cockroachdb/pebble.Batch.prepareDeferredKeyValueRecord. pebble.Batch.grow may panic if the batch data size plus the increase reaches cockroachdb/pebble.maxBatchSize
sizeIncrease := 1 + uint64(2*binary.MaxVarintLen32) + uint64(len(key)) + uint64(len(value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the comment before this line, but I don't understand why we need to add 1 + uint64(2*binary.MaxVarintLen32)to sizeIncrease.
I mean, I don't understand how a batch is encoded to be written to the DB, so I don't understand why we compute this sizeIncrease in the way as it is being computed.

Comment on lines +682 to +688
// The size increase is argument in call to cockroachdb/pebble.Batch.grow in cockroachdb/pebble.Batch.prepareDeferredKeyRecord. pebble.Batch.grow may panic if the batch data size plus the increase reaches cockroachdb/pebble.maxBatchSize
sizeIncrease := 1 + uint64(binary.MaxVarintLen32) + uint64(len(key))
// check if we fit within maxBatchSize
if uint64(b.b.Len())+sizeIncrease >= maxBatchSize {
// return an error instead of letting b.b.Delete to panic
return ethdb.ErrBatchTooLarge
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same code block is being defined in batch.Put, how about creating a function to abstract that?

CleanCacheSize int // Maximum memory allowance (in bytes) for caching clean nodes
}

// Defaults is the default setting for database if it's not specified.
// Notably, clean cache is disabled explicitly,
var Defaults = &Config{
// Arbitrum:
// default zeroes used to prevent need for correct initialization in all places used upstream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I correctly understand this comment.
But by default Config will be initialized with zero IdealCapBatchSize without explicitly setting it to zero.

if errors.Is(err, ethdb.ErrBatchTooLarge) {
log.Warn("Pebble batch limit reached in hashdb Cap operation, flushing batch. Consider setting ideal cap batch size to a lower value.", "pebbleError", err)
// flush batch & retry the write
if err = batch.Write(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a log.Error("Failed to write flush list to disk", "err", err) here before returning?
Error returned by batch.Write(), called in the next code block, has a log.Error before returning the error.

Comment on lines +360 to +363
idealBatchSize := uint(db.config.IdealCapBatchSize)
if idealBatchSize == 0 {
idealBatchSize = uint(ethdb.IdealBatchSize)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking this in every Cap and Commit calls, how about setting db.config.IdealCapBatchSize to ethdb.IdealBatchSize, in case TrieCapBatchSize is zero, when creating db.config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s CLA signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants