-
Notifications
You must be signed in to change notification settings - Fork 46
docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5723ff7
to
f7965aa
Compare
Signed-off-by: sallyom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the late review. We were waiting for our official container image to land; see comments below about replacing your included dockerfile with the official image.
- name: guidellm | ||
# TODO: replace this image | ||
image: quay.io/sallyom/guidellm:latest | ||
imagePullPolicy: IfNotPresent | ||
securityContext: | ||
allowPrivilegeEscalation: false | ||
capabilities: | ||
drop: | ||
- ALL | ||
runAsNonRoot: true | ||
seccompProfile: | ||
type: RuntimeDefault | ||
args: | ||
- benchmark | ||
- --target=$(TARGET) | ||
- --data=$(DATA) | ||
- --rate-type=sweep | ||
- --model=$(MODEL) | ||
- --output-path=/app/data/llama32-3b.yaml | ||
env: | ||
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file. | ||
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not | ||
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface. | ||
- name: HF_TOKEN | ||
valueFrom: | ||
secretKeyRef: | ||
key: HF_TOKEN | ||
name: huggingface-secret | ||
- name: TARGET | ||
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1" | ||
- name: DATA_TYPE | ||
value: "emulated" | ||
- name: DATA | ||
value: "prompt_tokens=512,output_tokens=128" | ||
- name: MODEL | ||
value: "meta-llama/Llama-3.2-3B-Instruct" | ||
volumeMounts: | ||
- name: output | ||
mountPath: /app/data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have our own image. Utilize that and the built-in support for environment variables (see /deploy
for more details).
- name: guidellm | |
# TODO: replace this image | |
image: quay.io/sallyom/guidellm:latest | |
imagePullPolicy: IfNotPresent | |
securityContext: | |
allowPrivilegeEscalation: false | |
capabilities: | |
drop: | |
- ALL | |
runAsNonRoot: true | |
seccompProfile: | |
type: RuntimeDefault | |
args: | |
- benchmark | |
- --target=$(TARGET) | |
- --data=$(DATA) | |
- --rate-type=sweep | |
- --model=$(MODEL) | |
- --output-path=/app/data/llama32-3b.yaml | |
env: | |
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file. | |
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not | |
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface. | |
- name: HF_TOKEN | |
valueFrom: | |
secretKeyRef: | |
key: HF_TOKEN | |
name: huggingface-secret | |
- name: TARGET | |
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1" | |
- name: DATA_TYPE | |
value: "emulated" | |
- name: DATA | |
value: "prompt_tokens=512,output_tokens=128" | |
- name: MODEL | |
value: "meta-llama/Llama-3.2-3B-Instruct" | |
volumeMounts: | |
- name: output | |
mountPath: /app/data | |
- name: guidellm | |
image: ghcr.io/neuralmagic/guidellm:latest | |
imagePullPolicy: IfNotPresent | |
securityContext: | |
allowPrivilegeEscalation: false | |
capabilities: | |
drop: | |
- ALL | |
runAsNonRoot: true | |
seccompProfile: | |
type: RuntimeDefault | |
env: | |
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file. | |
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not | |
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface. | |
- name: HF_TOKEN | |
valueFrom: | |
secretKeyRef: | |
key: HF_TOKEN | |
name: huggingface-secret | |
- name: GUIDELLM_TARGET | |
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80" | |
- name: GUIDELLM_RATE_TYPE | |
value: "sweep" | |
- name: GUIDELLM_DATA | |
value: "prompt_tokens=512,output_tokens=128" | |
- name: GUIDELLM_MODEL | |
value: "meta-llama/Llama-3.2-3B-Instruct" | |
volumeMounts: | |
- name: output | |
mountPath: /app/data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note --data-type
is deprecated, I removed it in the suggestion. Additionally --rate-type
was not configurable so I added it as an env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this file since we now have an official container.
> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod. |
|
||
> **📝 NOTE:** The HF_TOKEN is passed to the job, but this will not be necessary if you use the same PVC as the one storing your model. | ||
> Guidellm uses the model's tokenizer/processor files in its evaluation. You can pass a path instead with `--tokenizer=/path/to/model`. | ||
> This eliminates the need for Guidellm to download the files from Huggingface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> This eliminates the need for Guidellm to download the files from Huggingface. | |
> This eliminates the need for GuideLLM to download the files from Hugging Face. |
@@ -0,0 +1,53 @@ | |||
## Run Guidellm with Kubernetes Job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch case issues with the GuideLLM name. Please fix to be constant with the rest of our documentation.
## Run Guidellm with Kubernetes Job | |
## Run GuideLLM with Kubernetes Job |
Also please rebase on main for latest CI. |
I've been running in K8 with the job added in this PR, maybe it's useful to document as an example for everyone. Also, adding a simple
analyze_benchmarks.py
script to analyze the results of a GuideLLM run. Here's an example showing the plots generated by theanalyze_benchmarks.py
script run with output from a minikube/Llama3.2-3B-Instruct run.