-
Notifications
You must be signed in to change notification settings - Fork 9
Datasystem readme #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ### 安装 openYuanrong datasystem | ||
| 推荐方式:通过 pip 安装预编译 wheel 包 | ||
| ```bash | ||
| pip install https://openyuanrong.obs.cn-southwest-2.myhuaweicloud.com/openyuanrong_datasystem-0.5.0-cp39-cp39-manylinux_2_34_x86_64.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需提供多种python版本的whl包,并且提供开源代码仓路径和编译方式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
018f6b8 to
791314b
Compare
791314b to
280bbd5
Compare
| openYuanrong datasystem 是一个分布式缓存系统,利用计算集群的 HBM/DRAM/SSD 资源构建近计算多级缓存,提升模型训练及推理、大数据、微服务等场景数据访问性能。 | ||
|
|
||
| ## 环境要求 | ||
| 操作系统:openEuler 22.03 或更高版本 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其他操作系统不行吗?一定是openEuler吗?
| 1. 下载二进制包(参考 [etcd GitHub Releases](https://github.com/etcd-io/etcd/releases)): | ||
| ```bash | ||
| ETCD_VERSION="v3.5.12" | ||
| wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu架构也用变量替换吧,同时考虑x86和arm
| 使用 pip 安装预编译 wheel 包: | ||
| 如果使用 Python 3.9,运行: | ||
| ```bash | ||
| pip install https://openyuanrong.obs.cn-southwest-2.myhuaweicloud.com/release/0.6.0/linux/aarch64/openyuanrong_datasystem-0.6.0-cp39-cp39-manylinux_2_34_aarch64.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面etcd是x86的,这个地方datasystem是arm的,肯定装不起来,和etcd一样把cpu架构使用变量替换
| ``` | ||
|
|
||
| ## 配置 VLLM 使用 Yuanrong Connector | ||
| Datasystem 支持通过 ECMooncakeStorageConnector(用于 EC 传输)和 YuanRongConnector(用于 KVC 传输)与 VLLM 对接。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
再加一段,说清楚为啥EC用mooncake就行了,原理是啥,直觉上应该和kvc一样用yuanrong connector。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive documentation for deploying and using the openYuanrong datasystem, a distributed caching system. The documentation covers deployment of etcd, the datasystem itself, and integration with VLLM.
Key changes:
- Complete quick-start guide for datasystem deployment in Chinese
- Step-by-step instructions for etcd installation and cluster setup
- Integration examples for VLLM with EC and KV connectors
Comments suppressed due to low confidence (1)
docs/deployment/datasystem/run_example.md:43
- Using 0.0.0.0 in the listen and advertise URLs exposes etcd to all network interfaces without authentication. This is insecure for production environments. Consider adding a security warning or demonstrating authentication configuration, especially since the documentation mentions production deployment references.
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd-single=http://0.0.0.0:2380 &
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `--advertise-client-urls`:对外暴露的客户端地址。 | ||
| - `--listen-peer-urls`:集群节点间监听地址。 | ||
| - `--initial-advertise-peer-urls`:对其他节点暴露的地址。 | ||
| - `--initial-cluster`:初始节点列表,格式:节点名=节点peerURL。 |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space between "节点" and "peerURL" for better readability. Should be "节点名=节点 peerURL" instead of "节点名=节点peerURL".
| - `--initial-cluster`:初始节点列表,格式:节点名=节点peerURL。 | |
| - `--initial-cluster`:初始节点列表,格式:节点名=节点 peerURL。 |
| 替换 `${ETCD_IP}` 为 etcd 所在节点的 IP, `${WORKER_IP_N}` 为所在节点 N 的 IP,在每个节点启动一个监听端口号为 31501 的服务端进程: | ||
| ```bash | ||
| dscli start -w \ | ||
| --worker_address "${WORKER_IP_N}:31501" \ | ||
| --etcd_address "${ETCD_IP}:2379" \ | ||
| ``` |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder "${WORKER_IP_N}" with "N" suffix suggests multiple workers, but the instruction says "在每个节点启动一个监听端口号为 31501 的服务端进程" (start one server process on each node). It's unclear if multiple worker processes should run on the same node with different IPs or if each physical node runs one worker. Consider clarifying whether N represents different physical nodes or multiple workers per node.
| vllm serve Qwen/Qwen3-8B \ | ||
| --ec-transfer-config '{ | ||
| "ec_connector": "ECMooncakeStorageConnector", | ||
| "ec_role": "ec_producer" |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Prefill-Decoder node configuration is identical to the Encoder node configuration, both using "ec_producer" role. In a 1E1PD architecture, the Prefill-Decoder should typically be a consumer of the Encoder's output. The ec_role should likely be "ec_consumer" instead of "ec_producer".
| "ec_role": "ec_producer" | |
| "ec_role": "ec_consumer" |
|
|
||
| ### 启动集群 | ||
| > 提示:以下为最小化单节点部署示例。生产环境请参考 [官方集群部署文档](https://etcd.io/docs/current/op-guide/clustering/)。 | ||
| 1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380 ): |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space before the closing parenthesis. The text should read "如 2379 和 2380)" instead of "如 2379 和 2380 )".
| 1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380 ): | |
| 1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380): |
| etcd \ | ||
| --name etcd-single \ | ||
| --data-dir /tmp/etcd-data \ | ||
| --listen-client-urls http://0.0.0.0:2379 \ | ||
| --advertise-client-urls http://0.0.0.0:2379 \ | ||
| --listen-peer-urls http://0.0.0.0:2380 \ | ||
| --initial-advertise-peer-urls http://0.0.0.0:2380 \ | ||
| --initial-cluster etcd-single=http://0.0.0.0:2380 & |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example etcd startup command binds client and peer URLs to http://0.0.0.0 without TLS or authentication, which exposes the key-value store to any host that can reach this machine and can lead to unauthorized reads/writes of cluster metadata and potentially sensitive data. An attacker on the same network could directly interact with etcd on ports 2379/2380 using etcdctl or raw HTTP. For safer defaults, restrict --listen-client-urls/--listen-peer-urls to 127.0.0.1 or a secured interface and document enabling TLS and authentication for non-local or production use.
Description
Type of Change
Related Issues
Changes Made
Testing
Test Coverage
Documentation
Checklist
Screenshots/Output
Additional Notes
Reviewer Checklist