-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Currently, the segment size managed by an allocator is fixed at 1 GiB. This issue proposes 2 ways to allow dynamic segment sizing, configurable at runtime. The goal is to support allocators managing entire disks/vdevs, while still allowing the current 1 GiB segment or even smaller sizes.
Problem
The fixed segment size and implementation limits flexibility in allocator design. For larger vdevs, managing allocation bitmaps for the entire device could be more efficient than handling many smaller segments, especially if/when more powerful allocators, than the current SegmentAllocator are used. However, it's worth noting that for very large vdevs (e.g., 10 TiB), the allocation bitmap itself can become quite large (320 MiB in this case), posing challenges for reading and writing during each generation update. This could also become problematic for even moderate vdev sizes like 1 TiB (32 MiB bitmap). Smaller segment sizes might also be desirable in other situations for finer-grained control.
Proposed Changes (Bitmap-based)
This approach retains the bitmap-based allocators but makes the segment size configurable.
-
Configuration: Introduce a new parameter in
DatabaseConfigurationto specify the allocator segment size. This parameter could accept values like "128", "1024", "8192" to define the size of the segment in Bytes or KiB/MiB. A value of "full" could signal that the allocator manages the entire vdev.
The parameter could be specify the size for the whole storage configuration or just individual disks or layers. -
Allocator Interface: Modify the
Allocatortrait created in feat: Introduce generic allocator interface and new allocator implementations #66 to accept the segment size during initialization.pub trait Allocator: Send + Sync { fn new(bitmap: &[u8], segment_size: Block<u64>) -> Self where Self: Sized; // ... other methods }
-
SegmentId Handling: Remove the implicit tie between SegmentId and disk offset to reflect that a segment may now be a disk. Generalize
SegmentIdto reflect this. -
Bitmap Storage: The way bitmaps are stored probably doesn't have to change at all because
SegmentIds will still be unique. That means that the root tree can properly index them. It has to be investigated how the root tree handles large bitmaps larger than the maximum leaf size. -
Data Management Layer (DML) Changes:
- Update DML to handle the new configuration parameter and pass the segment size to the allocator.
- Modify the allocation and deallocation methods in DML to handle both full vdev allocation and segmented allocation.
-
HandlerChanges: TheallocatorsHashMap can probably stay the same for similar reasons as the bitmap storage.
Challenges
- Error Handling: Implement error handling for invalid segment sizes or configurations.
- Testing: Develop tests to cover different segment sizes and configurations.
Proposed Changes (Serialization-based)
This approach moves away from the fixed bitmap representation, allowing allocators to manage their state using arbitrary, serializable data structures.
-
Configuration: Same as in the bitmap-based approach. Introduce a new parameter in
DatabaseConfigurationto specify the allocator segment size, supporting values like "128", "1024", "8192" (for Bytes or KiB/MiB) and "full". -
Allocator Interface: Modify the
Allocatortrait proposed in feat: Introduce generic allocator interface and new allocator implementations #66 to use serialization for state management.pub trait Allocator: Send + Sync { fn new(segment_size: Block<u64>) -> Self where Self: Sized; fn allocate(&mut self, size: Block<u32>) -> Option<u32>; fn allocate_at(&mut self, size: Block<u32>, offset: u32) -> bool; fn deallocate(&mut self, offset: u32, size: Block<u32>); fn serialize(self) -> SlicedCowBytes; fn deserialize(SlicedCowBytes) -> Result<Self, Errorr> where Self: Sized; fn size(&self) -> Block<u32>; }
The
newfunction now takes asegment_sizeparameter.allocate,allocate_at, anddeallocateoperate on block offsets within a segment.serializeanddeserializemanage the state safed and loaded to/from disk. -
SegmentId Handling: Same as in the bitmap-based approach. Generalize
SegmentIdto support both traditional segments and entire disks. -
Allocator State Storage: Instead of storing bitmaps directly, the root tree will store the serialized allocator state. The
SegmentIdwill continue to be used as the key. TheHandlerwill need to serialize and deserialize allocator states. -
Data Management Layer (DML) Changes:
- Update DML to handle the new configuration parameter and pass the segment size to the allocator during initialization.
- Modify the allocation and deallocation methods to interact with the
Allocatortrait's new methods.
-
HandlerChanges:- The
Handlerwill now need to serialize the allocator's state when storing it in the root tree and deserialize it when retrieving. - The
allocatorsHashMap can probably stay the same.
- The
-
Delayed (De-)allocation Handling: This would be the main work in this approach.
- Because the internal state of allocators could be complex the current approach of directly modifying the bitmaps without the allocator knowing won't work.
- This means that many parts of the allocation and deallocation logic have to be rewritten to be agnostic of the allocator internals.
It's hard to estimate what else would have to be done because the allocation logic is so deeply intertwined with the snapshots, syncing and stretches many parts of the betree crate.
Advantages of Serialization-based Approach
- Flexibility: Allocators are no longer constrained to a bitmap representation, allowing for more sophisticated allocation strategies.
- Extensibility: Easier to add new allocator types with different state representations.
- Potential for Reduced Memory Usage: Some allocators might require less memory to track free space compared to a full bitmap.
Disadvantages of Serialization-based Approach
- Implementation Time: Because large parts of the code have to be changed this would require a lot of rewriting and testing.
- Complexity: Increased complexity due to serialization and deserialization. But maybe also decreased complexity because allocation bitmaps (or other new structures) aren't changed in different parts of the codebase.
- Serialization Overhead: Serializing and deserializing allocator state might introduce overhead compared to the simple bitmap approach. But for allocators using just a bitmap it should be about the same.
- Testing: Testing will be needed to ensure the correctness of the serialization and deserialization logic for each allocator.
Conclusion
Both approaches offer viable paths to dynamic segment sizing. The bitmap-based approach is simpler to implement but less flexible. The serialization-based approach offers greater flexibility and potential optimizations but is harder to realize. The choice between the two depends on the project's priorities and the expected evolution of the allocator implementations.