-
Notifications
You must be signed in to change notification settings - Fork 433
[Store] feat: introduce tired backend #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @YiXR, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the core components for Mooncake Store V2's tiered caching infrastructure. It defines the abstract interfaces and fundamental mechanisms required to manage data across various storage tiers, ensuring efficient data movement and organization. The changes focus on creating a flexible and extensible framework that separates data handling from caching policies, preparing the system for advanced cache management strategies. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这个 PR 为 Mooncake Store V2 引入了分层后端的抽象基类,奠定了良好的基础。整体设计清晰,将数据拷贝、层级管理和后端逻辑分离,易于扩展。
我发现了一些需要注意的问题:
- 在
tiered_backend.cpp的Init方法中存在一个严重问题,它会在一个空指针上调用方法,导致程序崩溃。 data_copier.cpp中的一个安全检查目前是无效的,可能会在未来添加新内存类型时导致问题。tiered_backend.cpp的MoveData方法在移动失败的场景下,错误处理可以更健壮,以避免数据不一致。- 此外,还有一个小的代码整洁性建议。
请查看具体的审查评论。
| std::unique_ptr<CacheTier> tier; | ||
|
|
||
| if (!tier->Init(this, engine)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific cachetier implementation will be supplemented in the subsequent pr. Here, we will wait for the implementation for now and comment out this part of the code temporarily.
| std::unique_ptr<DataCopier> DataCopierBuilder::Build() const { | ||
| const std::vector<MemoryType> all_types = { MemoryType::DRAM }; | ||
|
|
||
| for (const auto& type : all_types) { | ||
| if (type == MemoryType::DRAM) continue; | ||
|
|
||
| if (copy_matrix_.find({type, MemoryType::DRAM}) == copy_matrix_.end()) { | ||
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for type " + | ||
| MemoryTypeToString(type) + " TO DRAM."); | ||
| } | ||
| if (copy_matrix_.find({MemoryType::DRAM, type}) == copy_matrix_.end()) { | ||
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for DRAM TO type " + | ||
| MemoryTypeToString(type) + "."); | ||
| } | ||
| } | ||
|
|
||
| return std::unique_ptr<DataCopier>(new DataCopier(copy_matrix_)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build() 方法中的验证逻辑旨在确保所有内存类型都注册了与 DRAM 之间的拷贝函数。然而,all_types 向量被硬编码为只包含 MemoryType::DRAM。这导致执行检查的循环体永远不会被执行,使得验证无效。如果将来添加了新的内存类型但忘记提供必要的拷贝函数,这可能会导致运行时错误。为了修复这个问题,你应该从 CopierRegistry 动态构建要检查的类型列表。
| std::unique_ptr<DataCopier> DataCopierBuilder::Build() const { | |
| const std::vector<MemoryType> all_types = { MemoryType::DRAM }; | |
| for (const auto& type : all_types) { | |
| if (type == MemoryType::DRAM) continue; | |
| if (copy_matrix_.find({type, MemoryType::DRAM}) == copy_matrix_.end()) { | |
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for type " + | |
| MemoryTypeToString(type) + " TO DRAM."); | |
| } | |
| if (copy_matrix_.find({MemoryType::DRAM, type}) == copy_matrix_.end()) { | |
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for DRAM TO type " + | |
| MemoryTypeToString(type) + "."); | |
| } | |
| } | |
| return std::unique_ptr<DataCopier>(new DataCopier(copy_matrix_)); | |
| } | |
| std::unique_ptr<DataCopier> DataCopierBuilder::Build() const { | |
| const auto& registry = CopierRegistry::GetInstance(); | |
| for (const auto& reg : registry.GetMemoryTypeRegistrations()) { | |
| if (reg.type == MemoryType::DRAM) { | |
| continue; | |
| } | |
| if (copy_matrix_.find({reg.type, MemoryType::DRAM}) == copy_matrix_.end()) { | |
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for type " + | |
| MemoryTypeToString(reg.type) + " TO DRAM."); | |
| } | |
| if (copy_matrix_.find({MemoryType::DRAM, reg.type}) == copy_matrix_.end()) { | |
| throw std::logic_error("DataCopierBuilder Error: Missing copy function for DRAM TO type " + | |
| MemoryTypeToString(reg.type) + "."); | |
| } | |
| } | |
| return std::unique_ptr<DataCopier>(new DataCopier(copy_matrix_)); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if (!src_tier->Delete(key)) { | ||
| LOG(ERROR) << "CRITICAL INCONSISTENCY: Moved key '" << key << "' to tier " << dest_tier_id | ||
| << " but failed to delete from source " << src_tier_id << "."; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 MoveData 方法中,如果 dest_tier->Put() 成功,但随后的 src_tier->Delete() 失败,系统将处于不一致状态,即同一个键存在于两个层级中。当前实现仅记录了一个“CRITICAL INCONSISTENCY”错误并继续执行。这可能会在以后导致难以察觉的错误。为了使系统更健壮,可以考虑实现回滚机制,例如,如果从源层删除失败,则尝试从目标层删除新创建的条目。
| if (!src_tier->Delete(key)) { | |
| LOG(ERROR) << "CRITICAL INCONSISTENCY: Moved key '" << key << "' to tier " << dest_tier_id | |
| << " but failed to delete from source " << src_tier_id << "."; | |
| } | |
| if (!src_tier->Delete(key)) { | |
| LOG(ERROR) << "CRITICAL INCONSISTENCY: Moved key '" << key << "' to tier " << dest_tier_id | |
| << " but failed to delete from source " << src_tier_id << ". Attempting rollback."; | |
| // Attempt to roll back by deleting the key from the destination tier. | |
| if (!dest_tier->Delete(key)) { | |
| LOG(FATAL) << "Rollback failed. Data for key '" << key << "' is now duplicated in tiers " | |
| << src_tier_id << " and " << dest_tier_id << ". Manual intervention required."; | |
| } | |
| // Even if rollback succeeds, the original move operation failed. | |
| return false; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
044cd3d to
0d58056
Compare
Signed-off-by: Xingrui Yi <[email protected]>
| auto& src_tier = src_it->second; | ||
| auto& dest_tier = dest_it->second; | ||
|
|
||
| std::unique_lock<std::shared_mutex> lock(map_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this code holds the lock for too long
| * It supports a fallback mechanism via DRAM for any copy paths that are not | ||
| * explicitly registered as a direct path. | ||
| */ | ||
| class DataCopier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this one support async copy?
| * This struct is used as a generic descriptor for a block of memory, allowing | ||
| * data to be described abstractly regardless of its physical location. | ||
| */ | ||
| struct DataSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to support vram, dram and ssd?
This is the first PR for Mooncake Store V3 tiered backend. This PR introduce the abstract base classes for cache tier management.
#954
CacheTier(Cache Tier)mooncake-store/include/tiered_cache/cache_tier.hGet(key, data, size): Gets data from this tier.Put(key, source): Puts data into this tier. The implementation needs to allocate its own memory and copy the data.Delete(key): Deletes data from this tier.Contains(key): Checks if data exists in this tier.AsDataSource(key): Packages the data in this tier into aDataSourceobject for subsequent data movement.CacheTier(e.g.,DramCacheTier) interact withTieredBackend, primarily to use common services provided byTieredBackend, such as theDataCopier.DataCopier(Data Copier)mooncake-store/include/tiered_cache/data_copier.hMemoryType). This is a core utility class that decouples the data movement logic.DataCopierBuilder, allowing for the registration of direct copy functions between different memory types (e.g., DRAM -> VRAM).Copyinterface. When a direct copy path is unavailable, it automatically employs a fallback mechanism using DRAM as an intermediate buffer (e.g., VRAM -> DRAM -> SSD) (when implementing a new type, compilation requires copy functions between the new type and DRAM). This greatly simplifies the integration of new storage media.TieredBackendandCacheTierhold and use aDataCopierinstance to execute all data copy operations, whether it's writing new data or moving data between tiers.TieredBackend(Tiered Backend)mooncake-store/include/tiered_cache/tiered_backend.hCacheTierinstances.CacheTier, such as a tag list and priority, which are configured via a config file.GetTierViews()provides theCacheSchedulerwith a global view of allCacheTiers, including information like usage, priority, and tag lists, to aid scheduling algorithms.key_to_tier_map_to quickly locate whichCacheTiera key resides in.Worker:Get,Put,Delete,MoveData.Get()operation uses thekey_to_tier_map_to look up data directly in the correspondingCacheTier.Put()operation writes data to a specifiedCacheTier.MoveData()operation moves data between twoCacheTiers.TieredBackendis the bridge between the high-level business logic (Worker) and the underlying storage (CacheTier). It receives instructions and invokes the appropriateCacheTierinstances and theDataCopierto complete tasks. It also provides status views (TierView) of allCacheTiers to the upper layers.