Skip to content

Conversation

@yangsonglin13
Copy link

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test improvements
  • CI/CD improvements

Related Issues

Changes Made

Testing

  • Existing tests pass
  • New tests added (if applicable)
  • Manual testing performed

Test Coverage

Documentation

  • Documentation updated (if needed)
  • Code comments added/updated
  • API documentation updated (if applicable)

Checklist

  • I have read the CONTRIBUTING guidelines
  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have signed off my commits (DCO)

Screenshots/Output

Additional Notes

Reviewer Checklist

  • Code quality and style
  • Test coverage adequate
  • Documentation updated
  • Performance considerations reviewed
  • Security implications considered
  • Breaking changes documented

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 6, 2025
### 安装 openYuanrong datasystem
推荐方式:通过 pip 安装预编译 wheel 包
```bash
pip install https://openyuanrong.obs.cn-southwest-2.myhuaweicloud.com/openyuanrong_datasystem-0.5.0-cp39-cp39-manylinux_2_34_x86_64.whl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需提供多种python版本的whl包,并且提供开源代码仓路径和编译方式

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openYuanrong datasystem 是一个分布式缓存系统,利用计算集群的 HBM/DRAM/SSD 资源构建近计算多级缓存,提升模型训练及推理、大数据、微服务等场景数据访问性能。

## 环境要求
操作系统:openEuler 22.03 或更高版本
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其他操作系统不行吗?一定是openEuler吗?

1. 下载二进制包(参考 [etcd GitHub Releases](https://github.com/etcd-io/etcd/releases)):
```bash
ETCD_VERSION="v3.5.12"
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu架构也用变量替换吧,同时考虑x86和arm

使用 pip 安装预编译 wheel 包:
如果使用 Python 3.9,运行:
```bash
pip install https://openyuanrong.obs.cn-southwest-2.myhuaweicloud.com/release/0.6.0/linux/aarch64/openyuanrong_datasystem-0.6.0-cp39-cp39-manylinux_2_34_aarch64.whl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面etcd是x86的,这个地方datasystem是arm的,肯定装不起来,和etcd一样把cpu架构使用变量替换

```

## 配置 VLLM 使用 Yuanrong Connector
Datasystem 支持通过 ECMooncakeStorageConnector(用于 EC 传输)和 YuanRongConnector(用于 KVC 传输)与 VLLM 对接。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

再加一段,说清楚为啥EC用mooncake就行了,原理是啥,直觉上应该和kvc一样用yuanrong connector。

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for deploying and using the openYuanrong datasystem, a distributed caching system. The documentation covers deployment of etcd, the datasystem itself, and integration with VLLM.

Key changes:

  • Complete quick-start guide for datasystem deployment in Chinese
  • Step-by-step instructions for etcd installation and cluster setup
  • Integration examples for VLLM with EC and KV connectors
Comments suppressed due to low confidence (1)

docs/deployment/datasystem/run_example.md:43

  • Using 0.0.0.0 in the listen and advertise URLs exposes etcd to all network interfaces without authentication. This is insecure for production environments. Consider adding a security warning or demonstrating authentication configuration, especially since the documentation mentions production deployment references.
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://0.0.0.0:2379 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --initial-advertise-peer-urls http://0.0.0.0:2380 \
  --initial-cluster etcd-single=http://0.0.0.0:2380 &

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- `--advertise-client-urls`:对外暴露的客户端地址。
- `--listen-peer-urls`:集群节点间监听地址。
- `--initial-advertise-peer-urls`:对其他节点暴露的地址。
- `--initial-cluster`:初始节点列表,格式:节点名=节点peerURL。
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space between "节点" and "peerURL" for better readability. Should be "节点名=节点 peerURL" instead of "节点名=节点peerURL".

Suggested change
- `--initial-cluster`:初始节点列表,格式:节点名=节点peerURL
- `--initial-cluster`:初始节点列表,格式:节点名=节点 peerURL

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +91
替换 `${ETCD_IP}` 为 etcd 所在节点的 IP, `${WORKER_IP_N}` 为所在节点 N 的 IP,在每个节点启动一个监听端口号为 31501 的服务端进程:
```bash
dscli start -w \
--worker_address "${WORKER_IP_N}:31501" \
--etcd_address "${ETCD_IP}:2379" \
```
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder "${WORKER_IP_N}" with "N" suffix suggests multiple workers, but the instruction says "在每个节点启动一个监听端口号为 31501 的服务端进程" (start one server process on each node). It's unclear if multiple worker processes should run on the same node with different IPs or if each physical node runs one worker. Consider clarifying whether N represents different physical nodes or multiple workers per node.

Copilot uses AI. Check for mistakes.
vllm serve Qwen/Qwen3-8B \
--ec-transfer-config '{
"ec_connector": "ECMooncakeStorageConnector",
"ec_role": "ec_producer"
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prefill-Decoder node configuration is identical to the Encoder node configuration, both using "ec_producer" role. In a 1E1PD architecture, the Prefill-Decoder should typically be a consumer of the Encoder's output. The ec_role should likely be "ec_consumer" instead of "ec_producer".

Suggested change
"ec_role": "ec_producer"
"ec_role": "ec_consumer"

Copilot uses AI. Check for mistakes.

### 启动集群
> 提示:以下为最小化单节点部署示例。生产环境请参考 [官方集群部署文档](https://etcd.io/docs/current/op-guide/clustering/)。
1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380 ):
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space before the closing parenthesis. The text should read "如 2379 和 2380)" instead of "如 2379 和 2380 )".

Suggested change
1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380 ):
1. 启动单节点 etcd 集群,并设置任意空闲端口(如 2379 和 2380):

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +43
etcd \
--name etcd-single \
--data-dir /tmp/etcd-data \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd-single=http://0.0.0.0:2380 &
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example etcd startup command binds client and peer URLs to http://0.0.0.0 without TLS or authentication, which exposes the key-value store to any host that can reach this machine and can lead to unauthorized reads/writes of cluster metadata and potentially sensitive data. An attacker on the same network could directly interact with etcd on ports 2379/2380 using etcdctl or raw HTTP. For safer defaults, restrict --listen-client-urls/--listen-peer-urls to 127.0.0.1 or a secured interface and document enabling TLS and authentication for non-local or production use.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants