revert bfloat, clarify cloud instance

chrismoroney · chrismoroney · commit bea94d75d81c · 2025-10-20T08:28:31.000-07:00
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md b/content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md
@@ -31,13 +31,9 @@ To run inference with multiple prompts, you can create a simple Python script to
 Use a text editor to save the Python script below in a file called `batch.py`:
 
 ```python
-import os
 import json
 from vllm import LLM, SamplingParams
 
-# Force CPU-only execution
-os.environ["CUDA_VISIBLE_DEVICES"] = ""
-
 # Sample prompts.
 prompts = [
     "Write a hello world program in C",
@@ -52,7 +48,7 @@ MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
 
 # Create an LLM.
-llm = LLM(model=MODEL, dtype="float32", enforce_eager=True, tensor_parallel_size=1)
+llm = LLM(model=MODEL, dtype="bfloat16")
 
 # Generate texts from the prompts. The output is a list of RequestOutput objects
 # that contain the prompt, generated text, and other information.
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm/vllm-server.md b/content/learning-paths/servers-and-cloud-computing/vllm/vllm-server.md
@@ -19,7 +19,7 @@ OpenAI compatibility means that you can reuse existing software which was design
 Run vLLM with the same `Qwen/Qwen2.5-0.5B-Instruct` model:
 
 ```bash
-python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float32
+python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float16
 ```
 
 The server output displays that it is ready for requests:
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm/vllm-setup.md b/content/learning-paths/servers-and-cloud-computing/vllm/vllm-setup.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## Before you begin
 
-To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage.
+To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage. The instructions have been tested on an AWS Graviton3 m7g.2xlarge instance.
 
 ## What is vLLM?