Skip to content

Commit 1522cb9

Browse files
docs(vllm_performance): update website docs (#211)
* docs(website): vllm_actuator * docs(website): improvements to vllm_performance docs * docs(backends): Improvements to kuberay/raycluster docs * docs(website): vllm_performance_full updates * docs(examples): Update examples for performace-testing-full (vllm) * docs(vllm_perf): minor fixes * chore(lint): Stop linter changing list indentation * docs(ray-backend): updated example service account to enavle managing PVCs. * docs(vllm_perf): Update re: pvcs * chore(docs): fix indentation * chore(docs): fix indentation * chore(docs): fix indentation * chore(docs): include YAML from file * docs(vllm_perf): update to actuator docs * docs(vllm_perf): update to example docs * docs(vllm_perf): update to example yaml * docs(vllm_performance): Update to new template options Also further doc improvements * Apply suggestions from code review Co-authored-by: Christian Pinto <[email protected]> Signed-off-by: Michael Johnston <[email protected]> --------- Signed-off-by: Michael Johnston <[email protected]> Co-authored-by: Christian Pinto <[email protected]> Co-authored-by: Christian Pinto <[email protected]>
1 parent ed6189d commit 1522cb9

File tree

12 files changed

+595
-202
lines changed

12 files changed

+595
-202
lines changed

.secrets.baseline

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"files": "requirements.txt|^.secrets.baseline$",
44
"lines": null
55
},
6-
"generated_at": "2025-11-05T16:16:55Z",
6+
"generated_at": "2025-11-10T08:32:10Z",
77
"plugins_used": [
88
{
99
"name": "AWSKeyDetector"
@@ -414,7 +414,7 @@
414414
}
415415
]
416416
},
417-
"version": "0.13.1+ibm.64.dss",
417+
"version": "0.13.1+ibm.62.dss",
418418
"word_list": {
419419
"file": null,
420420
"hash": null

backend/kuberay/README.md

Lines changed: 50 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,26 @@ the
1818

1919
## Deploying a RayCluster
2020

21-
> [!WARNING]
21+
> [!WARNING] Ray version compatibility
2222
>
23-
> The `ray` versions must be compatible. For a more in depth guide refer to the
24-
> [RayCluster configuration](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html)
23+
> The `ray` version set in KubeRay YAML and the one
24+
> used in the ray head and worker containers must be compatible.
25+
> For a more in depth guide refer to the [RayCluster configuration](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html)
2526
> page.
2627
27-
!!! note
28+
We provide [an example set of values](vanilla-ray.yaml) for deploying a
29+
RayCluster via KubeRay. To deploy it run:
30+
31+
``` commandline
32+
helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml
33+
```
2834

29-
When running multi-node measurement make sure that
30-
all nodes in your multi-node setup have read and write access
31-
to your HuggingFace home directory. On Kubernetes with RayCluster,
32-
avoid S3-like filesystems as that is known to cause failures
33-
in **transformers**. Use a NFS or GPFS-backed PersistentVolumeClaim instead.
35+
Feel free to customize the example file provided to suit your cluster,
36+
such as uncommenting GPU-enabled workers.
3437

35-
### Configuring a Kubernetes ServiceAccount for the RayCluster
38+
### Enabling ado actuators to create K8s resources
39+
40+
#### Configuring a ServiceAccount for the RayCluster
3641

3742
The default Kubernetes ServiceAccount created for a RayCluster does not
3843
have enough permissions for an ado actuator to create Kubernetes resources
@@ -46,46 +51,14 @@ It also provides access to the RayCluster resources.
4651

4752
<!-- markdownlint-disable-next-line code-block-style -->
4853
```yaml
49-
apiVersion: v1
50-
kind: ServiceAccount
51-
metadata:
52-
name: ray-deployer
53-
---
54-
apiVersion: rbac.authorization.k8s.io/v1
55-
kind: RoleBinding
56-
metadata:
57-
name: ray-deployer
58-
roleRef:
59-
apiGroup: rbac.authorization.k8s.io
60-
kind: Role
61-
name: ray-deployer
62-
subjects:
63-
- kind: ServiceAccount
64-
name: ray-deployer
65-
---
66-
apiVersion: rbac.authorization.k8s.io/v1
67-
kind: Role
68-
metadata:
69-
name: ray-deployer
70-
rules:
71-
- apiGroups: ["ray.io"]
72-
resources:
73-
- rayclusters
74-
verbs: ["get", "patch"]
75-
- apiGroups: ["apps"]
76-
resources:
77-
- pods
78-
- deployments
79-
verbs: ["get", "create", "delete", "list", "watch", "update"]
80-
- apiGroups: [""]
81-
resources:
82-
- services
83-
verbs: ["get", "create", "delete", "list", "watch", "update"]
54+
{% include "./service-account.yaml" %}
8455
```
8556

8657
From the root of the ado project run the below command:
8758

88-
kubectl apply -f backend/kuberay/service-account.yaml
59+
```commandline
60+
kubectl apply -f backend/kuberay/service-account.yaml
61+
```
8962

9063
This will create a ServiceAccount named `ray-deployer`.
9164
We will reference this name later when
@@ -94,6 +67,19 @@ We will reference this name later when
9467
More information about ServiceAccount, Role, and RoleBinding objects can be found
9568
in the [official Kubernetes RBAC documentation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
9669

70+
#### Associating a RayCluster with the ServiceAccount
71+
72+
The below command shows how to set the `serviceAccountName` property for head
73+
and worker nodes.
74+
75+
<!-- markdownlint-disable-next-line code-block-style -->
76+
```bash
77+
helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 \
78+
--values backend/kuberay/vanilla-ray-service-account.yaml \
79+
--set head.serviceAccountName=ray-deployer \
80+
--set worker.serviceAccountName=ray-deployer
81+
```
82+
9783
### Best Practices for Efficient GPU Resource Utilization
9884

9985
To maximize the efficiency of your RayCluster and minimize GPU resource
@@ -124,12 +110,13 @@ Recommended worker setup:
124110
- 4 replicas of a worker with **8 GPUs**
125111

126112
<!-- markdownlint-disable no-inline-html -->
113+
127114
<details>
128115
<summary>
129116
Example: The contents of the additionalWorkerGroups field of a RayCluster
130117
with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memory
131118
</summary>
132-
119+
<!-- markdownlint-disable MD046 -->
133120
```yaml
134121
one-A100-80G-gpu-WG:
135122
replicas: 0
@@ -288,34 +275,24 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor
288275
# volumes: ...
289276
# volumeMounts: ....
290277
```
291-
278+
<!-- markdownlint-enable MD046 -->
292279
</details>
293280
<!-- markdownlint-enable no-inline-html -->
294281

295-
!!! note
296-
297-
Notice that the only variant with a **full-worker** custom resource
298-
is the one with 8 GPUs. Some actuators, like SFTTrainer, use this
299-
custom resource for measurements that involve reserving an entire GPU node.
300-
301-
We provide [an example set of values](vanilla-ray.yaml) for deploying a
302-
RayCluster via KubeRay. To deploy it, simply run:
303-
304-
helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml
305-
306-
In the case the ado operation to be executed requires creating Kubernetes
307-
resources, the RayCluster to be deployed must be associated with a properly
308-
configured ServiceAccount like the one described [above](#configuring-a-kubernetes-serviceaccount-for-the-raycluster).
309-
The below command shows how to set the `serviceAccountName` property for head
310-
and worker nodes.
282+
> [!IMPORTANT] full-worker custom resource
283+
>
284+
> Notice that the only variant with a **full-worker** custom resource
285+
> is the one with 8 GPUs. Some actuators, like SFTTrainer, use this
286+
> custom resource for measurements that involve reserving an entire GPU node.
311287
312-
<!-- markdownlint-disable-next-line code-block-style -->
313-
```bash
314-
helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 \
315-
--values backend/kuberay/vanilla-ray-service-account.yaml \
316-
--set head.serviceAccountName=ray-deployer \
317-
--set worker.serviceAccountName=ray-deployer
318-
```
288+
### RayClusters and SFTTrainer
319289

320-
Feel free to customize the example file provided to suit your cluster,
321-
such as uncommenting GPU-enabled workers.
290+
> [!IMPORTANT] HuggingFace home directory
291+
>
292+
> If you want to run multi-node measurements with
293+
> the SFTTrainer actuator make sure that
294+
> all nodes in your multi-node setup have read and write access
295+
> to your HuggingFace home directory. On Kubernetes with RayClusters,
296+
> avoid S3-like filesystems as that is known to cause failures
297+
> in **transformers**.
298+
> Use a NFS or GPFS-backed PersistentVolumeClaim instead.

backend/kuberay/service-account.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,5 @@ rules:
3636
- apiGroups: [""]
3737
resources:
3838
- services
39+
- persistentvolumeclaims
3940
verbs: ["get", "create", "delete", "list", "watch", "update"]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Copyright (c) IBM Corporation
2+
# SPDX-License-Identifier: MIT
3+
actuatorIdentifier: vllm_performance
4+
metadata:
5+
name: "Test actuator deployment"
6+
parameters:
7+
benchmark_retries: 3
8+
hf_token: 'test' # Set if you need to access a gated model
9+
image_secret: ''
10+
in_cluster: false
11+
interpreter: python3
12+
max_environments: 1
13+
namespace: null # Must set to the namespace to create deployments
14+
node_selector: {}
15+
retries_timeout: 5
16+
verify_ssl: false
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# Copyright (c) IBM Corporation
22
# SPDX-License-Identifier: MIT
3-
4-
sampleStoreIdentifier: 2963a5
53
entitySpace:
64
- identifier: model
75
propertyDomain:
@@ -11,36 +9,26 @@ entitySpace:
119
propertyDomain:
1210
values:
1311
- quay.io/dataprep1/data-prep-kit/vllm_image:0.1
14-
- identifier: n_cpus
15-
propertyDomain:
16-
values: [8]
17-
- identifier: memory
18-
propertyDomain:
19-
values: ["128Gi"]
20-
- identifier: dtype
12+
- identifier: "number_input_tokens"
2113
propertyDomain:
22-
values: ["auto"]
23-
- identifier: "num_prompts"
24-
propertyDomain:
25-
values: [500]
14+
values: [1024, 2048, 4096]
2615
- identifier: "request_rate"
2716
propertyDomain:
28-
values: [-1]
29-
- identifier: "max_concurrency"
30-
propertyDomain:
31-
values: [-1]
32-
- identifier: "gpu_memory_utilization"
17+
domainRange: [1,10]
18+
interval: 1
19+
- identifier: n_cpus
3320
propertyDomain:
34-
values: [.9]
35-
- identifier: "cpu_offload"
21+
domainRange: [2,16]
22+
interval: 2
23+
- identifier: memory
3624
propertyDomain:
37-
values: [0]
25+
values: ["128Gi", "256Gi"]
3826
- identifier: "max_batch_tokens"
3927
propertyDomain:
40-
values: [16384]
28+
values: [1024, 2048, 4096, 8192, 16384, 32768]
4129
- identifier: "max_num_seq"
4230
propertyDomain:
43-
values: [256]
31+
values: [16,32,64]
4432
- identifier: "n_gpus"
4533
propertyDomain:
4634
values: [1]
@@ -51,4 +39,5 @@ experiments:
5139
- actuatorIdentifier: vllm_performance
5240
experimentIdentifier: performance-testing-full
5341
metadata:
54-
description: Parameters for VLLM performance testing
42+
description: A space of vllm deployment configurations
43+
name: vllm_deployments

0 commit comments

Comments
 (0)