Skip to content

Commit 5768e66

Browse files
committedFeb 19, 2025
Document Sync by Tina
1 parent 9b94ac9 commit 5768e66

File tree

3 files changed

+122
-183
lines changed

3 files changed

+122
-183
lines changed
 

‎docs/stable/intro.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ServerlessLLM now supports NVIDIA and AMD GPUs, including following hardware:
2929
### ServerlessLLM Store
3030

3131
- [Quickstart](./store/quickstart.md)
32-
- [ROCm Installation(Experimental)](./store/installation_with_rocm.md)
32+
- [ROCm Quickstart](./store/rocm_quickstart.md)
3333

3434
### ServerlessLLM CLI
3535

‎docs/stable/store/installation_with_rocm.md

-182
This file was deleted.

‎docs/stable/store/rocm_quickstart.md

+121
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# ROCm Quick Start
6+
7+
ServerlessLLM Store (`sllm-store`) currently supports ROCm platform. However, there are no pre-built wheels for ROCm.
8+
9+
Due to an internal bug in ROCm, serverless-llm-store may face a GPU memory leak in ROCm before version 6.2.0, as noted in [issue](https://github.com/ROCm/HIP/issues/3580).
10+
11+
1. Clone the repository and enter the `store` directory:
12+
13+
```bash
14+
git clone https://github.com/ServerlessLLM/ServerlessLLM.git
15+
cd ServerlessLLM/sllm_store
16+
```
17+
After that, you may either use the Docker image or build the `sllm-store` wheel from source and install it in your environment.
18+
19+
## Use the Docker image
20+
21+
We provide a Docker file with ROCm support. Currently, it's built on base image `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0`
22+
23+
2. Build the Docker image:
24+
25+
``` bash
26+
docker build -t sllm_store_rocm -f Dockerfile.rocm .
27+
```
28+
29+
3. Start the Docker container:
30+
31+
:::tip
32+
If you want to run inference outside the Docker container, you need to pass the port to the host machine. For example, `-p 8073:8073`. You can also get the wheel from the Docker container after starting it via `docker cp sllm_store_server:/app/dist .`.
33+
:::
34+
35+
``` bash
36+
docker run --name sllm_store_server --rm -it \
37+
--device /dev/kfd --device /dev/dri \
38+
--security-opt seccomp=unconfined \
39+
-v $(pwd)/models:/models \
40+
sllm_store_rocm
41+
```
42+
43+
Expected output:
44+
45+
``` bash
46+
INFO 02-13 04:52:36 cli.py:76] Starting gRPC server
47+
INFO 02-13 04:52:36 server.py:40] StorageServicer: storage_path=/models, mem_pool_size=4294967296, num_thread=4, chunk_size=33554432, registration_required=False
48+
WARNING: Logging before InitGoogleLogging() is written to STDERR
49+
I20250213 04:52:36.284631 1 checkpoint_store_hip.cpp:42] Number of GPUs: 1
50+
I20250213 04:52:36.284652 1 checkpoint_store_hip.cpp:44] I/O threads: 4, chunk size: 32MB
51+
I20250213 04:52:36.284659 1 checkpoint_store_hip.cpp:46] Storage path: "/models"
52+
I20250213 04:52:36.284674 1 checkpoint_store_hip.cpp:72] GPU 0 UUID: 61363865-3865-3038-3831-366132376261
53+
I20250213 04:52:36.425267 1 pinned_memory_pool_hip.cpp:30] Creating PinnedMemoryPool with 128 buffers of 33554432 bytes
54+
I20250213 04:52:37.333868 1 checkpoint_store_hip.cpp:84] Memory pool created with 4GB
55+
INFO 02-13 04:52:37 server.py:231] Starting gRPC server on 0.0.0.0:8073
56+
57+
```
58+
59+
After starting the Docker container, you can enter the container and run the following command to test the installation.
60+
61+
``` bash
62+
docker exec -it sllm_store_server /bin/bash
63+
```
64+
65+
Try to save and load a transformer model:
66+
67+
``` bash
68+
python3 examples/save_transformers_model.py --model_name "facebook/opt-1.3b" --storage_path "/models"
69+
python3 examples/load_transformers_model.py --model_name "facebook/opt-1.3b" --storage_path "/models"
70+
```
71+
Expected output:
72+
73+
``` bash
74+
DEBUG 02-13 04:58:09 transformers.py:178] load_dict_non_blocking takes 0.005706787109375 seconds
75+
DEBUG 02-13 04:58:09 transformers.py:189] load config takes 0.0013949871063232422 seconds
76+
DEBUG 02-13 04:58:09 torch.py:137] allocate_cuda_memory takes 0.001325368881225586 seconds
77+
DEBUG 02-13 04:58:09 client.py:72] load_into_gpu: facebook/opt-1.3b, d34e8994-37da-4357-a86c-2205175e3b3f
78+
INFO 02-13 04:58:09 client.py:113] Model loaded: facebook/opt-1.3b, d34e8994-37da-4357-a86c-2205175e3b3f
79+
INFO 02-13 04:58:09 torch.py:160] restore state_dict takes 0.0004620552062988281 seconds
80+
DEBUG 02-13 04:58:09 transformers.py:199] load model takes 0.06779956817626953 seconds
81+
INFO 02-13 04:58:09 client.py:117] confirm_model_loaded: facebook/opt-1.3b, d34e8994-37da-4357-a86c-2205175e3b3f
82+
INFO 02-13 04:58:14 client.py:125] Model loaded
83+
Model loading time: 5.14s
84+
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 8.26MB/s]
85+
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 4.05MB/s]
86+
merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 3.07MB/s]
87+
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 441/441 [00:00<00:00, 4.59MB/s]
88+
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/generation/utils.py:1249: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `
89+
max_new_tokens` to control the maximum length of the generation.
90+
warnings.warn(
91+
Hello, my dog is cute and I want to give him a good home. I have a
92+
93+
```
94+
95+
## Build the wheel from source and install
96+
97+
Currently, `pip install .` does not work with ROCm. We suggest you build `sllm-store` wheel and manually install it in your environment.
98+
99+
100+
101+
If there's a customized PyTorch version installed, you may need to run the following command to modify the `torch` version in `requirements.txt`:
102+
103+
```bash
104+
python3 using_existing_torch.py
105+
```
106+
107+
2. Build the wheel:
108+
109+
```bash
110+
python setup.py sdist bdist_wheel
111+
```
112+
113+
## Known issues
114+
115+
1. GPU memory leak in ROCm before version 6.2.0.
116+
117+
This issue is due to an internal bug in ROCm. After the inference instance is completed, the GPU memory is still occupied and not released. For more information, please refer to [issue](https://github.com/ROCm/HIP/issues/3580).
118+
119+
2. vLLM v0.5.0.post1 can not be built in ROCm 6.2.0
120+
121+
This issue is due to the ambiguity of a function call in ROCm 6.2.0. You may change the vLLM's source code as in this [commit](https://github.com/vllm-project/vllm/commit/9984605412de1171a72d955cfcb954725edd4d6f).

0 commit comments

Comments
 (0)