docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

sallyom · 2025-05-14T02:44:55Z

I've been running in K8 with the job added in this PR, maybe it's useful to document as an example for everyone. Also, adding a simple analyze_benchmarks.py script to analyze the results of a GuideLLM run. Here's an example showing the plots generated by the analyze_benchmarks.py script run with output from a minikube/Llama3.2-3B-Instruct run.

Signed-off-by: sallyom <[email protected]>

sjmonson

Apologies for the late review. We were waiting for our official container image to land; see comments below about replacing your included dockerfile with the official image.

sjmonson · 2025-06-17T18:39:01Z

docs/guides/k8s/guidellm-job.yaml

+      - name: guidellm
+        # TODO: replace this image
+        image: quay.io/sallyom/guidellm:latest
+        imagePullPolicy: IfNotPresent
+        securityContext:
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+              - ALL
+          runAsNonRoot: true
+          seccompProfile:
+            type: RuntimeDefault
+        args:
+        - benchmark
+        - --target=$(TARGET)
+        - --data=$(DATA)
+        - --rate-type=sweep
+        - --model=$(MODEL)
+        - --output-path=/app/data/llama32-3b.yaml
+        env:
+        # HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.
+        # You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not
+        # pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.
+        - name: HF_TOKEN
+          valueFrom:
+            secretKeyRef:
+              key: HF_TOKEN
+              name: huggingface-secret
+        - name: TARGET
+          value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1"
+        - name: DATA_TYPE
+          value: "emulated"
+        - name: DATA
+          value: "prompt_tokens=512,output_tokens=128"
+        - name: MODEL
+          value: "meta-llama/Llama-3.2-3B-Instruct"
+        volumeMounts:
+        - name: output
+          mountPath: /app/data


We now have our own image. Utilize that and the built-in support for environment variables (see /deploy for more details).

Suggested change

- name: guidellm

# TODO: replace this image

image: quay.io/sallyom/guidellm:latest

imagePullPolicy: IfNotPresent

securityContext:

allowPrivilegeEscalation: false

capabilities:

drop:

- ALL

runAsNonRoot: true

seccompProfile:

type: RuntimeDefault

args:

- benchmark

- --target=$(TARGET)

- --data=$(DATA)

- --rate-type=sweep

- --model=$(MODEL)

- --output-path=/app/data/llama32-3b.yaml

env:

# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.

# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not

# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.

- name: HF_TOKEN

valueFrom:

secretKeyRef:

key: HF_TOKEN

name: huggingface-secret

- name: TARGET

value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80/v1"

- name: DATA_TYPE

value: "emulated"

- name: DATA

value: "prompt_tokens=512,output_tokens=128"

- name: MODEL

value: "meta-llama/Llama-3.2-3B-Instruct"

volumeMounts:

- name: output

mountPath: /app/data

- name: guidellm

image: ghcr.io/neuralmagic/guidellm:latest

imagePullPolicy: IfNotPresent

securityContext:

allowPrivilegeEscalation: false

capabilities:

drop:

- ALL

runAsNonRoot: true

seccompProfile:

type: RuntimeDefault

env:

# HF_TOKEN is not necessary if you share/use the model PVC. Guidellm needs to access the tokenizer file.

# You can provide a path to the tokenizer file by passing `--tokenizer=/path/to/model`. If you do not

# pass the tokenizer path, Guidellm will get the tokenizer file(s) from Huggingface.

- name: HF_TOKEN

valueFrom:

secretKeyRef:

key: HF_TOKEN

name: huggingface-secret

- name: GUIDELLM_TARGET

value: "http://llm-d-inference-gateway.llm-d.svc.cluster.local:80"

- name: GUIDELLM_RATE_TYPE

value: "sweep"

- name: GUIDELLM_DATA

value: "prompt_tokens=512,output_tokens=128"

- name: GUIDELLM_MODEL

value: "meta-llama/Llama-3.2-3B-Instruct"

volumeMounts:

- name: output

mountPath: /app/data

Also note --data-type is deprecated, I removed it in the suggestion. Additionally --rate-type was not configurable so I added it as an env.

sjmonson · 2025-06-17T18:42:21Z

docs/guides/k8s/Dockerfile

Drop this file since we now have an official container.

sjmonson · 2025-06-17T18:43:33Z

docs/guides/k8s/README.md

+> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod.
+


Suggested change

> **📝 NOTE:** [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod.

sjmonson · 2025-06-17T18:44:06Z

docs/guides/k8s/README.md

+
+> **📝 NOTE:** The HF_TOKEN is passed to the job, but this will not be necessary if you use the same PVC as the one storing your model.
+> Guidellm uses the model's tokenizer/processor files in its evaluation. You can pass a path instead with `--tokenizer=/path/to/model`.
+> This eliminates the need for Guidellm to download the files from Huggingface.


Suggested change

> This eliminates the need for Guidellm to download the files from Huggingface.

> This eliminates the need for GuideLLM to download the files from Hugging Face.

sjmonson · 2025-06-17T18:45:53Z

docs/guides/k8s/README.md

@@ -0,0 +1,53 @@
+## Run Guidellm with Kubernetes Job


There are a bunch case issues with the GuideLLM name. Please fix to be constant with the rest of our documentation.

Suggested change

## Run Guidellm with Kubernetes Job

## Run GuideLLM with Kubernetes Job

sjmonson · 2025-06-17T19:01:49Z

Also please rebase on main for latest CI. Development / build has an issue and will always fail, but all others should pass. You can run ruff check --fix docs/guides and ruff format docs/guides to auto fix the majority of linting issues.

sallyom force-pushed the example-run branch 2 times, most recently from 5723ff7 to f7965aa Compare May 15, 2025 02:23

add example-runs folder

9cf39e4

Signed-off-by: sallyom <[email protected]>

sallyom force-pushed the example-run branch from f7965aa to 9cf39e4 Compare May 16, 2025 02:41

markurtz requested a review from sjmonson June 17, 2025 14:36

sjmonson requested changes Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

Uh oh!

sallyom commented May 14, 2025

Uh oh!

sjmonson left a comment

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson Jun 17, 2025

Uh oh!

sjmonson commented Jun 17, 2025

Uh oh!

Uh oh!

		> 📝 NOTE: [Dockerfile](./Dockerfile) was used to build the image for the guidellm-job pod.

	> This eliminates the need for Guidellm to download the files from Huggingface.
	> This eliminates the need for GuideLLM to download the files from Hugging Face.

	## Run Guidellm with Kubernetes Job
	## Run GuideLLM with Kubernetes Job

docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

Are you sure you want to change the base?

docs: add k8s guide and Llama-3.2-3b-Instruct example-run #161

Uh oh!

Conversation

sallyom commented May 14, 2025

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson commented Jun 17, 2025

Uh oh!

Uh oh!