Skip to content

Commit bea94d7

Browse files
committed
revert bfloat, clarify cloud instance
1 parent 7699226 commit bea94d7

File tree

3 files changed

+3
-7
lines changed

3 files changed

+3
-7
lines changed

content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,9 @@ To run inference with multiple prompts, you can create a simple Python script to
3131
Use a text editor to save the Python script below in a file called `batch.py`:
3232

3333
```python
34-
import os
3534
import json
3635
from vllm import LLM, SamplingParams
3736

38-
# Force CPU-only execution
39-
os.environ["CUDA_VISIBLE_DEVICES"] = ""
40-
4137
# Sample prompts.
4238
prompts = [
4339
"Write a hello world program in C",
@@ -52,7 +48,7 @@ MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
5248
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
5349

5450
# Create an LLM.
55-
llm = LLM(model=MODEL, dtype="float32", enforce_eager=True, tensor_parallel_size=1)
51+
llm = LLM(model=MODEL, dtype="bfloat16")
5652

5753
# Generate texts from the prompts. The output is a list of RequestOutput objects
5854
# that contain the prompt, generated text, and other information.

content/learning-paths/servers-and-cloud-computing/vllm/vllm-server.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ OpenAI compatibility means that you can reuse existing software which was design
1919
Run vLLM with the same `Qwen/Qwen2.5-0.5B-Instruct` model:
2020

2121
```bash
22-
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float32
22+
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float16
2323
```
2424

2525
The server output displays that it is ready for requests:

content/learning-paths/servers-and-cloud-computing/vllm/vllm-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Before you begin
1010

11-
To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage.
11+
To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage. The instructions have been tested on an AWS Graviton3 m7g.2xlarge instance.
1212

1313
## What is vLLM?
1414

0 commit comments

Comments
 (0)