Skip to content

docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sallyom
Copy link

@sallyom sallyom commented May 14, 2025

I've been running in K8 with the job added in this PR, maybe it's useful to document as an example for everyone. Also, adding a simple analyze_benchmarks.py script to analyze the results of a GuideLLM run. Here's an example showing the plots generated by the analyze_benchmarks.py script run with output from a minikube/Llama3.2-3B-Instruct run.

@sallyom sallyom force-pushed the example-run branch 2 times, most recently from 5723ff7 to f7965aa Compare May 15, 2025 02:23
Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late review. We were waiting for our official container image to land; see comments below about replacing your included dockerfile with the official image.

Comment on lines +17 to +55
- name: guidellm
# TODO: replace this image
image: quay.io/sallyom/guidellm:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
args:
- benchmark
- --target=$(TARGET)
- --data=$(DATA)
- --rate-type=sweep
- --model=$(MODEL)
- --output-path=/app/data/llama32-3b.yaml
env:
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: HF_TOKEN
name: huggingface-secret
- name: TARGET
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1"
- name: DATA_TYPE
value: "emulated"
- name: DATA
value: "prompt_tokens=512,output_tokens=128"
- name: MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
volumeMounts:
- name: output
mountPath: /app/data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have our own image. Utilize that and the built-in support for environment variables (see /deploy for more details).

Suggested change
- name: guidellm
# TODO: replace this image
image: quay.io/sallyom/guidellm:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
args:
- benchmark
- --target=$(TARGET)
- --data=$(DATA)
- --rate-type=sweep
- --model=$(MODEL)
- --output-path=/app/data/llama32-3b.yaml
env:
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: HF_TOKEN
name: huggingface-secret
- name: TARGET
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1"
- name: DATA_TYPE
value: "emulated"
- name: DATA
value: "prompt_tokens=512,output_tokens=128"
- name: MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
volumeMounts:
- name: output
mountPath: /app/data
- name: guidellm
image: ghcr.io/neuralmagic/guidellm:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
env:
# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.
# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not
# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: HF_TOKEN
name: huggingface-secret
- name: GUIDELLM_TARGET
value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80"
- name: GUIDELLM_RATE_TYPE
value: "sweep"
- name: GUIDELLM_DATA
value: "prompt_tokens=512,output_tokens=128"
- name: GUIDELLM_MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
volumeMounts:
- name: output
mountPath: /app/data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note --data-type is deprecated, I removed it in the suggestion. Additionally --rate-type was not configurable so I added it as an env.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this file since we now have an official container.

Comment on lines +15 to +16
> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod.


> **📝 NOTE:** The HF_TOKEN is passed to the job, but this will not be necessary if you use the same PVC as the one storing your model.
> Guidellm uses the model's tokenizer/processor files in its evaluation. You can pass a path instead with `--tokenizer=/path/to/model`.
> This eliminates the need for Guidellm to download the files from Huggingface.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> This eliminates the need for Guidellm to download the files from Huggingface.
> This eliminates the need for GuideLLM to download the files from Hugging Face.

@@ -0,0 +1,53 @@
## Run Guidellm with Kubernetes Job
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch case issues with the GuideLLM name. Please fix to be constant with the rest of our documentation.

Suggested change
## Run Guidellm with Kubernetes Job
## Run GuideLLM with Kubernetes Job

@sjmonson
Copy link
Collaborator

Also please rebase on main for latest CI. Development / build has an issue and will always fail, but all others should pass. You can run ruff check --fix docs/guides and ruff format docs/guides to auto fix the majority of linting issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants