feat[Store]: Add standalone deployment implementation for Client #1084

YiXR · 2025-11-20T09:18:56Z

Description

Thanks @wanyue-wy @stmatengss @XucSh for helping with architectural design.

Type of Change

Architecture Overview

Split Client into DummyClient and RealClient, which RealClient is an independent process.
Full Compatibility: Fully compatible with the old arch.
Flexible Deployment: After applying the detached deployment patch to Sglang（tmp patch：https://github.com/YiXR/sglang/tree/xinyi-client, the Client detached deployment mode can be enabled via configuration. The non-detached mode can still be chosen.
Zero-Copy Performance: Introduces shared memory to support zero-copy operations between RealClient and DummyClient, ensuring that performance is on par with the non-detached mode.
Resilience: The RealClient continues to run and provide Storage for the Master even after the DummyClient exits.

Code Modification Overview

File Creation/Change Description

Renamed client.cpp & client.h to client_service.cpp & client_service.h.
Added real_client_main.cpp as the main process for the RealClient.
Added dummy_client.cpp & dummy_client.h for the DummyClient.
Renamed pybind_client.cpp & pybind_client.h to real_client.cpp & real_client.h. All instances of class PyClient have been renamed to class RealClient.
Added pyclient.h, which defines a new base class PyClient. PyClient maintains the interface to ensure backward compatibility. Both RealClient & DummyClient inherit from it and implement the corresponding functions.

Key Implementation/Change Details

mooncake-integration/store/store_py.cpp

This is the entry point for the Python compatibility layer, where the decision to use detached deployment is made. The store in store_py is defined as a PyClient base class pointer. If the client detached mode is selected, the store is instantiated as a DummyClient; otherwise, it's instantiated as a RealClient.

mooncake-store/src/dummy_client.cpp

DummyClient Implementation: Implements the DummyClient, which is primarily responsible for connecting to the RealClient and initiating RPC calls. Its high-level interface inherits from PyClient to remain consistent with the old code.
Shared Memory Creation: The DummyClient is responsible for creating the shared memory. During its setup phase, it creates the shared memory and notifies the RealClient to mmap it and register it with the transfer engine.
Shared Memory Details:
a. The shared memory created by the DummyClient provides memory space for the L2 cache (like mem_pool_host in SGLang) and the client's local buffer (held by client_buffer_allocator_).
b. shm_size = local_buffer_size + mem_pool_host_size. The first half of the shared memory is used for the L2 cache, and the second half is for the client's local buffer.
c. During the DummyClient setup phase, the RealClient is notified to register the entire shared memory block with the transfer engine at once. Therefore, there is no need to later call registerLocalMemory separately for the L2 cache.
d. After notifying the RealClient to mmap the shared memory, the DummyClient immediately calls shm_unlink. This removes the shared memory file, preventing any other external processes from mapping it. Since the RealClient has already completed the mmap, it can continue to access this memory. Due to the kernel's automatic reclamation mechanism for shared memory, the memory will be automatically freed when its reference count drops to zero (i.e., when both DummyClient and RealClient unmap it).
Keep-Alive: The DummyClient has a Ping thread to maintain its connection with the RealClient.

mooncake-store/src/real_client.cpp

Code Reuse: The RealClient reuses all the implementation from the original pybind_client (including creating the global_segment, holding the transfer engine, communicating with the master, etc.).
RPC Service: The RealClient registers its internal functions with an RPC service so they can be called by the DummyClient.
Shared Memory Details:
a. When notified by the DummyClient to map the shared memory, the RealClient will mmap it and record both the DummyClient's side VA (Virtual Address) and its own RealClient side VA. This facilitates virtual address translation in subsequent zero-copy interface calls.
b. The RealClient uses the second half of the shared memory to create its client_buffer_allocator_.
Virtual Address Translation: For RPCs involving zero-copy or using the client local buffer for transfers, virtual addresses are passed as parameters/return values. The RealClient, when handling these functions, must use the previously recorded VAs to perform virtual address translation. This ensures that the virtual addresses it uses are valid in its own context, and the virtual addresses it returns to the DummyClient are valid in the DummyClient's context.
Health Check: The RealClient uses a Ping mechanism to continuously monitor the status of the DummyClient. If it detects that the DummyClient is unavailable, it will unmap the corresponding shared memory to ensure timely memory cleanup.

mooncake-store/src/client_buffer.cpp

External Buffer Support: ClientBufferAllocator now supports being constructed with an external buffer. This change is made to accommodate the requirement of creating a client_buffer_allocator_ using the pre-allocated shared memory.

Types
- Bug fix
- New feature
  - Transfer Engine
  - Mooncake Store
  - Mooncake EP
  - Integration
  - P2P Store
  - Python Wheel
- Breaking change
- CI/CD
- Documentation update
- Other

How Has This Been Tested?

To support detached Client mode, the patch described above must be applied to Sglang. (Without the patch, Sglang can still start normally in non-detached mode).

Launching in Detached Mode
a. Start the master:

mooncake_master --enable_http_metadata_server=true --eviction_high_watermark_ratio=0.95

b. Start the RealClient (listens on port 50052 by default):

mooncake_client --global_segment_size=4GB

c. Start Sglang. Set MOONCAKE_MASTER to the RealClient's address and MOONCAKE_STANDALONE_STORAGE=1 to enable standalone client deployment.

MOONCAKE_MASTER=127.0.0.1:50052 \
MOONCAKE_STANDALONE_STORAGE=1 \
python3 -m sglang.launch_server \
    --enable-hierarchical-cache \
    --hicache-storage-backend mooncake \
    --model-path Qwen/Qwen3-0.6B

Checklist

I have performed a self-review of my own code.
I have updated the documentation.
I have added tests to prove my changes are effective.

TODO List

gemini-code-assist · 2025-11-20T09:19:54Z

Summary of Changes

Hello @YiXR, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural enhancement by decoupling the client-side logic into independent processes. This change allows for more flexible deployment scenarios and improves the resilience of the storage system. By leveraging shared memory for inter-process communication, the new design maintains high performance, specifically enabling zero-copy data transfers between the client components. The refactoring also sets the stage for future scalability and robustness improvements.

Highlights

Independent Client Deployment: Introduced a new architecture allowing the client to be deployed independently from the main application, enabling greater flexibility and resilience. This can be configured to use a 'detached deployment mode'.
Zero-Copy Performance with Shared Memory: Implemented shared memory mechanisms to facilitate zero-copy operations between the new 'RealClient' and 'DummyClient' components, ensuring that performance remains on par with non-detached modes.
Enhanced Resilience: The 'RealClient' is designed to continue running and providing storage services to the Master even if the 'DummyClient' process exits, improving overall system stability.
Client Refactoring and New Components: The existing client implementation has been refactored into a base class PyClient, with two concrete implementations: RealClient (the core logic) and DummyClient (a lightweight proxy). Key files like client.cpp and pybind_client.cpp were renamed, and new files such as real_client_main.cpp and dummy_client.cpp were added.
Dynamic Client Instantiation: The Python compatibility layer (store_py.cpp) now dynamically instantiates either a DummyClient or RealClient based on configuration, providing a seamless transition for existing Python interfaces.
Shared Memory Management in RealClient: The RealClient now includes RPC services to manage shared memory regions, handling mapping, unmapping, and registration of buffers within these shared segments, along with virtual address translation for zero-copy transfers.
DummyClient Health Monitoring: The RealClient incorporates a ping mechanism to continuously monitor the status of its associated DummyClient, ensuring timely cleanup of shared memory resources if the DummyClient becomes unresponsive.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant architectural change to support independent deployment of the client, using a DummyClient and RealClient model with shared memory for zero-copy communication. While the overall structure is well-defined, the implementation has several critical issues that need to be addressed. My review identified a logic bug in error handling within the Python bindings, multiple key functions in DummyClient that are left as unimplemented TODOs, and incomplete high-availability logic. Most critically, there's a design flaw in how RealClient monitors and cleans up resources for DummyClient instances, which would fail in a multi-client scenario. I've detailed these points in the specific comments.

gemini-code-assist · 2025-11-20T09:21:53Z

mooncake-store/src/real_client.cpp

+void RealClient::dummy_client_monitor_func() {
+    std::unordered_map<UUID, std::chrono::steady_clock::time_point,
+                       boost::hash<UUID>>
+        client_ttl;
+    while (dummy_client_monitor_running_) {
+        auto now = std::chrono::steady_clock::now();
+
+        // Update the client ttl
+        PodUUID pod_client_id;
+        while (dummy_client_ping_queue_.pop(pod_client_id)) {
+            UUID client_id = {pod_client_id.first, pod_client_id.second};
+            client_ttl[client_id] =
+                now + std::chrono::seconds(dummy_client_live_ttl_sec_);
+        }
+
+        // Find out expired clients
+        std::vector<UUID> expired_clients;
+        for (auto it = client_ttl.begin(); it != client_ttl.end();) {
+            if (it->second < now) {
+                LOG(INFO) << "client_id=" << it->first
+                          << ", action=client_expired";
+                expired_clients.push_back(it->first);
+                it = client_ttl.erase(it);
+            } else {
+                ++it;
+            }
+        }
+
+        // Update the client status to NEED_REMOUNT
+        if (!expired_clients.empty()) {
+            {
+                // Lock client_mutex and segment_mutex
+                std::unique_lock<std::shared_mutex> lock(dummy_client_mutex_);
+                // Unmap shm segments associated with this client
+                // TODO: need to optimize this with client_id
+                unmap_shm_internal();
+            }  // Release the mutex before long-running ClearInvalidHandles and
+               // avoid deadlocks
+        }
+
+        std::this_thread::sleep_for(
+            std::chrono::milliseconds(kDummyClientMonitorSleepMs));
+    }
+}


The dummy_client_monitor_func has a design flaw that prevents it from correctly handling multiple DummyClient instances. When a DummyClient is detected as expired, unmap_shm_internal() is called without any client-specific identifier. This function cleans up the shared memory segment information stored in RealClient members like shm_buffer_, shm_name_, etc. Since these are single-instance members, this action will affect all connected DummyClients, not just the expired one. This can lead to memory corruption, crashes, or data loss if multiple dummy clients are used.

The implementation should be updated to manage shared memory resources on a per-client basis. The // TODO comment on line 1630 also indicates this is an incomplete implementation.

This will be solved later. I'll introduce another patch for multi dummy support.

gemini-code-assist · 2025-11-20T09:21:53Z

mooncake-integration/store/store_py.cpp

+            auto put_result = store_->put_parts(key, values);
            if (!put_result) {
-                return -static_cast<int>(put_result.error());
+                return -static_cast<int>(ErrorCode::INTERNAL_ERROR);
            }


There's a logic error in handling the return value from store_->put_parts. The put_parts function returns 0 on success and a negative error code on failure. The current code if (!put_result) evaluates to true on success (when put_result is 0), causing it to incorrectly return an error. This should be if (put_result != 0).

Additionally, on failure, it returns a generic ErrorCode::INTERNAL_ERROR instead of propagating the specific error code contained in put_result.

Suggested change

auto put_result = store_->put_parts(key, values);

if (!put_result) {

return -static_cast<int>(put_result.error());

return -static_cast<int>(ErrorCode::INTERNAL_ERROR);

}

auto put_result = store_->put_parts(key, values);

if (put_result != 0) {

return put_result;

}

gemini-code-assist · 2025-11-20T09:21:53Z

mooncake-store/src/dummy_client.cpp

+
+std::vector<std::shared_ptr<BufferHandle>> DummyClient::batch_get_buffer(
+    const std::vector<std::string>& keys) {
+    // TODO: implement this function


Several functions in DummyClient are not implemented and are marked with // TODO: implement this function. This includes batch_get_buffer, get_into, put_from, put_from_with_metadata, batch_put_from_multi_buffers, and batch_get_into_multi_buffers. Leaving these as unimplemented stubs in a feature branch can lead to unexpected behavior or runtime errors if they are called. These should be implemented or explicitly marked as unsupported (e.g., by throwing an exception).

These rpc will be implemented in later pr.

gemini-code-assist · 2025-11-20T09:21:53Z

mooncake-store/src/dummy_client.cpp

+            if (ping_fail_count >= max_ping_fail_count) {
+                LOG(ERROR) << "Ping failed " << ping_fail_count
+                           << " times, reconnecting...";
+                // TODO: Need to realize reconnect logic here


The ping_thread_main function contains a // TODO: Need to realize reconnect logic here. This indicates that the high-availability/reconnection logic for the DummyClient is incomplete. Without this logic, the client won't be able to recover from connection failures to the RealClient, which undermines the resilience goals of this change.

This will be solved in later pr.

gemini-code-assist · 2025-11-20T09:21:53Z

mooncake-store/include/dummy_client.h

+    // 新增一个函数用于获取buffer信息（地址和大小）
+    std::tuple<uint64_t, size_t> get_buffer_info(const std::string &key);


This comment is in Chinese, which is inconsistent with the English used in the rest of the codebase. To improve maintainability and ensure all developers can understand the code, it's best to use a single language for comments.

// Get buffer information (address and size) for a key. std::tuple<uint64_t, size_t> get_buffer_info(const std::string &key);

yejj710 · 2025-11-20T12:12:59Z

nice work! Perhaps you could add some docs to introduce the standalone store_client deployment mode. This would make it easier for users to understand how to use it.

Copilot

Pull Request Overview

This pull request implements an independent deployment architecture for the Mooncake Store Client, splitting it into RealClient (independent process) and DummyClient (lightweight proxy). The implementation provides full backward compatibility while enabling detached deployment mode with zero-copy performance through shared memory.

Key Changes:

Split client architecture into RealClient (server) and DummyClient (proxy) with shared memory-based zero-copy communication
Renamed client.cpp/client.h to client_service.cpp/client_service.h and pybind_client to real_client
Added pyclient.h base class for interface compatibility and dummy_client implementation
Introduced mooncake_client binary for standalone RealClient deployment

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
mooncake-store/src/real_client.cpp	Implements RealClient with RPC service registration, shared memory management, and virtual address translation
mooncake-store/src/dummy_client.cpp	Implements DummyClient RPC proxy with shared memory creation and ping-based health checking
mooncake-store/src/real_client_main.cpp	Main entry point for standalone RealClient service
mooncake-store/include/real_client.h	RealClient class definition with shared memory and dummy client monitoring
mooncake-store/include/dummy_client.h	DummyClient class definition with RPC client infrastructure
mooncake-store/include/pyclient.h	Base class defining common interface for both client types
mooncake-store/include/client_service.h	Renamed from client.h, defines core Client service class
mooncake-store/src/client_buffer.cpp	Added external buffer support for shared memory allocation
mooncake-store/include/client_buffer.hpp	Extended to support both heap and shared memory allocation
mooncake-store/include/utils.h	Added utility functions for string-to-byte-size conversion and Python return value conversion
mooncake-integration/store/store_py.cpp	Updated Python bindings to support both RealClient and DummyClient modes
mooncake-store/src/CMakeLists.txt	Added mooncake_client binary target and updated source file references
mooncake-store/tests/*.cpp	Updated test files to use renamed headers and RealClient class

Comments suppressed due to low confidence (7)

mooncake-store/src/real_client.cpp:800

The get_dummy_buffer_internal function returns a buffer handle with adjusted address (buffer_base - shm_addr_offset_). However, the BufferHandle object created by get_buffer will be immediately destroyed after this function returns since it's not stored anywhere. This means the buffer memory will be deallocated while the returned address is still being used by the dummy client, leading to a use-after-free bug. Consider storing the BufferHandle or managing the buffer lifetime differently.
mooncake-store/src/real_client.cpp:660
In unmap_shm_internal, client_->unregisterLocalMemory is called but the return value is not checked. If this operation fails, the function still proceeds to unmap the memory and clear all state, which could lead to inconsistent state. Consider checking the return value and handling errors appropriately.
mooncake-store/src/real_client.cpp:1652
The dummy_client_monitor_thread_ is started but never properly joined or stopped in the destructor or teardown. When RealClient is destroyed, the thread may continue running and access member variables that have been destroyed, leading to undefined behavior. Consider adding proper thread cleanup in the destructor or tearDownAll_internal.
mooncake-store/src/real_client.cpp:603
The map_shm_internal function returns tl::expected<void, ErrorCode> but throws exceptions on error instead of returning error codes. This is inconsistent with the expected return type and breaks the error handling pattern. Consider replacing throw std::runtime_error(...) with return tl::unexpected(ErrorCode::...) to properly propagate errors.
mooncake-store/src/real_client.cpp:306
In the tearDownAll_internal method, shm_name_, shm_addr_offset_, and shm_size_ are cleared but shm_buffer_ is not unmapped or cleared. This can lead to a resource leak. The method should call munmap(shm_buffer_, shm_size_) and set shm_buffer_ = nullptr before clearing the related fields, or should call unmap_shm_internal() which already handles this cleanup.
mooncake-store/src/real_client.cpp:716
In unregister_shm_buffer_internal, after erasing the iterator it at line 708, the function returns it->second at line 716. This is a use-after-erase bug that will result in undefined behavior. The value should be saved before erasing the iterator.
mooncake-store/include/real_client.h:408
The member variable shm_buffer_ at line 408 is not initialized. If used before map_shm_internal is called, it will contain garbage values. Consider initializing it to nullptr to avoid undefined behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-20T15:56:25Z

mooncake-store/src/dummy_client.cpp

+    ping_running_ = true;
+    ping_thread_ = std::thread([this]() mutable { this->ping_thread_main(); });


The ping thread is started before the shared memory setup completes. If shared memory setup fails (lines 139-183), the thread continues running and accessing client_id_ which may cause issues. Consider starting the ping thread after successful shared memory setup, or stopping it in the error handling paths.

Copilot · 2025-11-20T15:56:25Z

mooncake-integration/store/store_py.cpp

                }
+                ResourceTracker::getInstance().registerInstance(
+                    std::dynamic_pointer_cast<PyClient>(self.store_));


[nitpick] The ResourceTracker::getInstance().registerInstance() is called with a dynamically cast pointer, but for DummyClient this will succeed. However, ResourceTracker is designed to call cleanup on PyClient instances which may cause issues if the cleanup logic differs. Consider verifying that ResourceTracker properly handles both RealClient and DummyClient lifecycle management.

Suggested change

}

ResourceTracker::getInstance().registerInstance(

std::dynamic_pointer_cast<PyClient>(self.store_));

ResourceTracker::getInstance().registerInstance(

std::dynamic_pointer_cast<PyClient>(self.store_));

}

It handles，just ignore this comment。

Copilot · 2025-11-20T15:56:25Z

mooncake-store/src/dummy_client.cpp

+
+    for (const auto& result : internal_results) {
+        if (result.has_value()) {
+            results.push_back(result.value());


In batchIsExist, when a result has a value (line 327), it pushes result.value() directly. However, result.value() is a bool, but the return type should be 1 or 0 (or negative for errors) based on the pattern in isExist. This should be result.value() ? 1 : 0 to match the expected return format.

Suggested change

results.push_back(result.value());

results.push_back(result.value() ? 1 : 0);

Copilot · 2025-11-20T15:56:26Z

mooncake-integration/store/store_py.cpp

+            auto put_result = store_->put_parts(key, values);
            if (!put_result) {
-                return -static_cast<int>(put_result.error());
+                return -static_cast<int>(ErrorCode::INTERNAL_ERROR);
            }


The error handling at line 353-355 is inconsistent with the previous implementation. The put_parts method returns an int (0 on success, negative on error), not a tl::expected object, so the check if (!put_result) is incorrect. The code should check if (put_result != 0) and return put_result directly.

YiXR · 2025-11-21T03:38:34Z

nice work! Perhaps you could add some docs to introduce the standalone store_client deployment mode. This would make it easier for users to understand how to use it.

Thx for your comment. I've just added some introduction in doc.

ykwd

Great work! I’ve reviewed part of it, and these are my comments. I’ll finish reviewing the remaining sections as soon as possible.

ykwd · 2025-11-24T04:08:59Z

doc/en/mooncake-store.md

 The `Client` can be used in two modes:
 1. **Embedded mode**: Runs in the same process as the LLM inference program (e.g., a vLLM instance), by being imported as a shared library.
-2. **Standalone mode**: Runs as an independent process.
+2. **Standalone mode**: Runs as an independent process. In this mode, the `Client` is separated into two parts: a **dummy** `Client` and a **real** `Client`: The **real** `Client` is a full-featured implementation that runs as a standalone process and directly communicates with other Mooncake Store components. It handles all RPC communications, memory management, and data transfer operations. The **real** `Client` is typically deployed on nodes that contribute memory to the distributed cache pool; The **dummy** `Client` is a lightweight wrapper that forwards all operations to a local **real** `Client` via RPC calls, which is designed for scenarios where the client needs to be embedded in the same process as the application (such as vLLM), but the actual Mooncake Store operations should be handled by a standalone process. The **dummy** `Client` and the **real** `Client` communicate via RPC calls and shared memory to make sure that Zero-copy transfers are still possible.


Could you also sync the update to docs/source/design/mooncake-store.md?

ykwd · 2025-11-24T04:15:58Z

mooncake-store/src/client_buffer.cpp

-    // Free the aligned allocated memory
-    if (buffer_) {
+    // Free the aligned allocated memory or unmap shared memory
+    if (!is_external_memory_) {


Also need to check buffer_?

XucSh · 2025-11-25T02:58:29Z

mooncake-integration/store/store_py.cpp

        const std::vector<std::string> &keys) {
        const auto kNullString = pybind11::bytes("\\0", 0);
-        if (!store_ || !store_->client_) {
+        if (!store_ || (!use_dummy_client_ && !store_->client_)) {


is there a bug if use_dummy_client_ but client is not initialized,

dummy does not have client_ and it will never use it.

XucSh · 2025-11-25T03:20:52Z

mooncake-store/src/dummy_client.cpp

+        LOG(ERROR) << "Shared memory is not registered";
+        return -1;
+    }
+    if (registered_size_ + local_buffer_size_ + size > shm_size_) {


There needs a annotation. the shm_size is only used for sglang and local buffer, not include global segment

I‘ve changed the "global_buffer" into "mem_pool" in dummy setup to avoid misunderstanding.

XucSh · 2025-11-25T03:25:30Z

mooncake-store/src/dummy_client.cpp

+    easylog::set_min_severity(easylog::Severity::WARN);
+    // Initialize client pools
+    coro_io::client_pool<coro_rpc::coro_rpc_client>::pool_config pool_conf{};
+    const char* value = std::getenv("MC_RPC_PROTOCOL");


there needs a error handler when env is not present. And now only support rdma protocol with coro?

XucSh · 2025-11-25T03:29:00Z

mooncake-store/src/dummy_client.cpp

+
+    shm_name_ = "/dummy_client_shm_" + std::to_string(client_id_.first) + "_" +
+                std::to_string(client_id_.second);
+    shm_size_ = local_buffer_size + global_segment_size;


This value contains both local_buffer_size and global_segment_size. seems conflict with the meaning in alloc_from_mem_pool? And the sglang L2 cache is dismissed?

I‘ve changed the "global_buffer" into "mem_pool" in dummy setup to avoid misunderstanding.

XucSh · 2025-11-25T03:31:01Z

mooncake-store/src/dummy_client.cpp

+
+    // Inform the RealClient about the shared memory
+    ret = register_shm(shm_name_, reinterpret_cast<uint64_t>(shm_base_addr_),
+                       shm_size_, local_buffer_size);


when recovering from a crash, the setup is also called. And the new shmem will be the same as the last time?

A new method of remapping after a crash needs to be realized. In this new method, shm should be mapped from the fd not the name.

XucSh · 2025-11-25T03:33:34Z

mooncake-store/src/dummy_client.cpp

+}
+
+// Dummy only register buffer within the shared memory region
+int DummyClient::register_buffer(void* buffer, size_t size) {


register_buffer and alloc_from_mem_pool can be combined?

alloc_from_mem_pool is a alloc method, if sglang/vllm alloc success, it will use register_buffer to notify.

Signed-off-by: Xingrui Yi <[email protected]>

ykwd

Overall, it looks good to me. Just a few comments

ykwd · 2025-11-24T09:45:24Z

mooncake-integration/store/store_py.cpp

            py::arg("protocol"), py::arg("rdma_devices"),
-            py::arg("master_server_addr"), py::arg("engine") = py::none())
+            py::arg("master_server_addr"), py::arg("engine") = py::none(),
+            py::arg("use_dummy_client") = false)


Would it be better for future extensibility (e.g., supporting new architectures) if we change this parameter to a string-type client_type?

Just add new setup func for dummy.

ykwd · 2025-11-24T09:57:38Z

mooncake-integration/store/store_py.cpp

    }

    pybind11::list batch_get_tensor(const std::vector<std::string> &keys) {
        if (!store_ || !store_->client_) {


For currently unsupported APIs, it would be better to explicitly return an error. Otherwise, the invocation will fail and it might not be easy to debug.

I've just added more log for that.

ykwd · 2025-11-25T08:34:16Z

mooncake-integration/store/store_py.cpp


    pybind11::bytes get(const std::string &key) {
-        if (!store_ || !store_->client_) {
+        if (!store_ || (!use_dummy_client_ && !store_->client_)) {


Would it be possible to abstract this if condition into a separate function and include a comment inside the function explaining why the condition is checked this way? As it stands, the reasoning behind this condition isn’t very intuitive, and some interfaces don’t include the use_dummy_client_ part of the condition. After some time, future maintainers may not understand what this logic is meant for.

Signed-off-by: Xingrui Yi <[email protected]>

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

YiXR force-pushed the split-client branch 2 times, most recently from 891cb46 to 761459d Compare November 20, 2025 11:29

stmatengss requested a review from Copilot November 20, 2025 15:52

Copilot started reviewing on behalf of stmatengss November 20, 2025 15:52 View session

Copilot finished reviewing on behalf of stmatengss November 20, 2025 15:55

Copilot AI reviewed Nov 20, 2025

View reviewed changes

YiXR force-pushed the split-client branch from 761459d to 05f9e15 Compare November 21, 2025 03:16

YiXR changed the title ~~feat[Store]: Add independent deployment implementation for Client~~ feat[Store]: Add standalone deployment implementation for Client Nov 21, 2025

YiXR force-pushed the split-client branch 5 times, most recently from e38dbf1 to 0eb7ec3 Compare November 21, 2025 09:13

stmatengss added the high priority label Nov 24, 2025

ykwd reviewed Nov 24, 2025

View reviewed changes

YiXR force-pushed the split-client branch 8 times, most recently from f2af7c0 to f73b712 Compare November 24, 2025 08:32

XucSh reviewed Nov 25, 2025

View reviewed changes

XucSh self-assigned this Nov 25, 2025

YiXR added 4 commits November 25, 2025 15:05

[Store]: Add independent deployment implementation for Client

d8b7569

Signed-off-by: Xingrui Yi <[email protected]>

[Doc]: mooncake-store: add introduce for client standalone mode

757d212

Signed-off-by: Xingrui Yi <[email protected]>

[CI]: Add dummy client test

751d62b

Signed-off-by: Xingrui Yi <[email protected]>

[Store]: Change dummy client setup into a new func

54784f7

Signed-off-by: Xingrui Yi <[email protected]>

YiXR force-pushed the split-client branch from 1fc3688 to 54784f7 Compare November 25, 2025 07:05

XucSh added the run-ci label Nov 25, 2025

XucSh approved these changes Nov 25, 2025

View reviewed changes

ykwd reviewed Nov 25, 2025

View reviewed changes

YiXR added 2 commits November 25, 2025 17:34

[Store]: Add more log for dummy client

8f3ff99

Signed-off-by: Xingrui Yi <[email protected]>

Merge remote-tracking branch 'upstream/main'

e914a78

XucSh added run-ci and removed run-ci labels Nov 25, 2025

		// 新增一个函数用于获取buffer信息（地址和大小）
		std::tuple<uint64_t, size_t> get_buffer_info(const std::string &key);

		ping_running_ = true;
		ping_thread_ = std::thread([this]() mutable { this->ping_thread_main(); });

	results.push_back(result.value());
	results.push_back(result.value() ? 1 : 0);

feat[Store]: Add standalone deployment implementation for Client #1084

Are you sure you want to change the base?

feat[Store]: Add standalone deployment implementation for Client #1084

Conversation

YiXR commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Architecture Overview

Code Modification Overview

File Creation/Change Description

Key Implementation/Change Details

mooncake-integration/store/store_py.cpp

mooncake-store/src/dummy_client.cpp

mooncake-store/src/real_client.cpp

mooncake-store/src/client_buffer.cpp

How Has This Been Tested?

Checklist

TODO List

Uh oh!

gemini-code-assist bot commented Nov 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yejj710 commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

YiXR commented Nov 20, 2025 •

edited

Loading

YiXR commented Nov 21, 2025 •

edited

Loading

YiXR Nov 25, 2025 •

edited

Loading