Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
Signed-off-by: 逍遥 <[email protected]>
  • Loading branch information
xiaoyao committed Dec 3, 2024
1 parent e5abdbd commit 2531886
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 10 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,30 +35,38 @@ HAMi is a sandbox and [landscape](https://landscape.cncf.io/?item=orchestration-
HAMi provides device virtualization for several heterogeneous devices including GPU, by supporting device sharing and device resource isolation. For the list of devices supporting device virtualization, see [supported devices](#supported-devices)

### Device sharing

HAMi supports:
- Allows partial device allocation by specifying device memory.
- Imposes a hard limit on streaming multiprocessors.
- Permits partial device allocation by specifying device core usage.
- Requires zero changes to existing programs.

<img src="./imgs/example.png" width = "500" />

### Device Resources Isolation
HAMi supports:
- Imposes a hard limit on streaming multiprocessors.

A simple demostration for device isolation:
A task with the following resources.

```
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPU
nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory
nvidia.com/gpu: 1 # Declare how many physical GPUs the pod needs
nvidia.com/gpumem: 3000 # Identifies 3G GPU memory each physical GPU allocates to the pod
```

will see 3G device memory inside container

![img](./imgs/hard_limit.jpg)

> Note:
1. **After installing HAMi, the value of `nvidia.com/gpu` registered on the node defaults to the "number of vGPUs".**
2. **When requesting resources in a pod, `nvidia.com/gpu` refers to the "number of physical GPUs required by the current pod".**
3. **When sharing a physical GPU, you also need to configure `nvidia.com/gpumem` or `nvidia.com/gpucores` to limit the resource usage of individual tasks, in order to schedule multiple tasks on the same physical GPU.**
4. The `nvidia.com/gpumem` and `nvidia.com/gpucores` in the limits are not resource types, but rather they are used to specify the amount of GPU memory and compute power that the current Pod can use on each physical GPU.


### Supported devices

[![nvidia GPU](https://img.shields.io/badge/Nvidia-GPU-blue)](https://github.com/Project-HAMi/HAMi#preparing-your-gpu-nodes)
Expand Down
18 changes: 12 additions & 6 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,29 +43,35 @@ HAMi 是[Cloud Native Computing Foundation](https://cncf.io/)(CNCF)基金会的s
HAMi通过支持设备共享和设备资源隔离,为包括GPU在内的多个异构设备提供设备虚拟化。有关支持设备虚拟化的设备列表,请参阅 [支持的设备](#支持设备)

### 设备复用能力

HAMi支持:
- 允许通过指定显存来申请算力设备
- 算力资源的硬隔离
- 允许通过指定算力使用比例来申请算力设备
- 对已有程序零改动

<img src="./imgs/example.png" width = "500" />

### 设备资源隔离能力
HAMi支持:
- 设备资源的硬隔离

HAMi支持设备资源的硬隔离
一个以NVIDIA GPU为例硬隔离的简单展示:
一个使用以下方式定义的任务提交后
```yaml
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPU
nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory
nvidia.com/gpu: 1 # Declare how many physical GPUs the pod needs
nvidia.com/gpumem: 3000 # Identifies 3G GPU memory each physical GPU allocates to the pod
```
会只有3G可见显存
![img](./imgs/hard_limit.jpg)
> 注意:
1. **安装HAMi后,节点上注册的 `nvidia.com/gpu` 值默认为“vGPU数量”**
2. **pod中申请资源时,`nvidia.com/gpu` 指“当前pod需要的物理GPU数量”**
3. **需要共享物理GPU时,还需要配置 `nvidia.com/gpumem` 或 `nvidia.com/gpucores` 限制单个任务的资源使用量,才能将多个任务调度到同一个物理GPU上**
4. limits 中的 `nvidia.com/gpumem` 和 `nvidia.com/gpucores` 不是一种资源类型,而是用于限定当前Pod在每个物理GPU上所能使用的显存和算力

## 项目架构图

<img src="./imgs/hami-arch.png" width = "600" />
Expand Down Expand Up @@ -143,7 +149,7 @@ spec:
resources:
limits:
nvidia.com/gpu: "2" # 请求2个vGPUs
nvidia.com/gpumem: "3000" # 每个vGPU申请3000m显存 (可选,整数类型)
nvidia.com/gpumem: "3000" # 每个vGPU申请3000M显存 (可选,整数类型)
nvidia.com/gpucores: "30" # 每个vGPU的算力为30%实际显卡的算力 (可选,整数类型)
```

Expand Down

0 comments on commit 2531886

Please sign in to comment.