-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/support large keys (#42) #49
base: master
Are you sure you want to change the base?
Conversation
Modify index, record, and tombstone file serialization so that the prior version byte and key size byte now represent a 5 bit version and an 11 bit key length. Because we are at version zero, old code can still read data written by new code as long as key sizes are not larger than 127 bytes. If so, old code will interpret large key sizes as an incompatible version number. Data written by old code is fully compatible with new code because a version 0 byte still represents 0 after this change, and also does not impact the key. Prior: version byte -> version number, 0 to 255 key size byte -> key size, 0 to 127 After this commit: version byte -> top 5 bits are the version number, 0 to 31 bottom 3 bits are the three MSB of key size key size byte -> lower 8 bits of key size ------------------- | vbyte | ksbyte | old version and key size -- 8 bits each ------------------- | ver | key size | new version and key size -- 5 / 11 bit split -------------------
The two stack traces in the failed build make no sense at all. A byte array's length field is being passed in to another byte array initializer, which gives a negative array size exception. A byte array can't have a negative size... I ran the tests locally a dozen times without issue. I'll kick off travis again. |
I take that back, I can see that these tests have a ~ 4% or so chance of failing due to the test set-up. |
Move key length into InMemoryIndexMetaData. Use 5 bytes to represent the key size and vlaue size. Use 11 bits for the key size, and 29 for the value size, limiting keys to 2047 bytes and values to 512MB. Introduce HashEntry and HashEntrySerializer to abstract over key access inside the hash entry, and disambiguate user values from the in memory hash data.
Record is the public user visible class that contains minimal data. RecordEntry represents an entry in the data file. RecordIterated is needed internally to transmit sequenceNumber through for testing.
Key/Value sizes are serialized in a common way shared across HashEntry implementations. 5 bytes are split into 11 bits for the key size and 29 bits for the value size. Location information -- such as the file and offset of a record, is handled separately, as Segment implementations can choose where to serialize these bits of data relative to the key and each other.
Change the valid chunk range from 0-127 to 1-255 and treat the byte used for the chunk as unsigned. A chunk value of zero indicates an empty MemoryPoolAddress. This nearly doubles the effective maximum index size and efficiently uses all of the bits in the chunk index byte.
Place the reference to the next slot first Followed by the key/value lengths Then the fixedKeyLength reserved key space Then the location data (fileId, fileOffset, sequenceId)
Keys longer than fixedKeyLength 'overflow' into the next slot in the chain. An entry for a key that is equal to or smaller than fixedKeyLength remains the same as before, fitting into a single slot. If the key is longer, 'overflow' slots are allocated that contain only the 'next' pointer and additional key data. Additionally, the tests related to the memory pool variant have been significantly enhanced to improve coverage.
MemoryPoolAddress now contains a byte for the chunk and an int for the slot, instead of containing the offset of the slot. This will allow MemoryPoolAddress to become smaller in a later commit.
…oryPool MemoryPoolChunk.Slot is introduced in order to encapsulate access to a slot. This leads to cleaner code in SegmentWithMemoryPool as it no longer needs to manage slotOffset information and weave that between the chunk and MemoryPoolAddress. The result is significantly safer as well, the slot offset is hidden inside the Slot and can not accidentally cause out of bounds access. This reduces the number of places we have to validate the offset in a typical operation which may improve performance slightly as well, and if not at least decreases the exception paths to test.
MemoryPoolAddress is changed from 5 bytes to 4. One byte remains for the chunk index, but only three bytes are kept for the slot. Three bytes is up to 16.77 million slots in a single chunk. For a chunk configured for 8 byte fixedKeySize, this would imply 550MB + per chunk, with 255 total chunks allowed, for over 130GB of data in a single segment. As segments can scale themselves, this shrink does not introduce any real impediment to the maximum size of a db. However, it does save at least 2 bytes per entry. One for the table slot, and one for each memory pool slot.
d41fa2f
to
5dcca23
Compare
If no dependency between the commits, I would suggest split to multiple PRs with good test coverage and have the changes in step by step. |
This change allows for keys to be up to 2047 bytes long for both non-pooled and memory-pool implementations. Performance is moderately improved in the process, and the RAM required for the memory pool is reduced by 2+ bytes per key.
Each individual commit should build and test successfully, this was developed incrementally and tested at each step.
It may be easier to follow by first perusing the commits individually, then looking at the combined diff after that.
Read the full individual commit messages for more detailed information. A few highlights:
In the file format, 'version' is now a number between 0 and 31. 'key size' is a number between 0 and 2047.
For the in-memory data structures, the 5 bytes that used to hold key size and value size are now split with 11 bits for the key (size between 0 and 2047) and 29 bits for the value size (max value size 512MB).
MemoryPoolSegments are now addressed by their slot rather than their offset, which allows one more byte per entry to be removed, leading to 4-byte MemoryPoolAddress values, saving a byte per slot in the master table and a byte per slot in the pool.
The non-pooled implementation supports large keys in a simple way, by allocating a larger entry.
The memory pool implementation uses multiple slots (chained together) to hold larger key data. If keys are equal to or smaller than the
fixedKeyLength
, these function as before. If keys are larger, slots are used to hold the 'overflowing' portion of the key.Lastly, the memory pool implementation was refactored a bit to encapsulate the chunk offsets inside of a new
MemoryPoolChunk.Slot
class, which makes the code a lot easier to read and removes a lot of scope for error as the raw offsets are not passed around between chunks and addresses loosely.I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.