Skip to content

Commit 83fca9d

Browse files
committedMar 7, 2025·
Document Sync by Tina
1 parent f9e914c commit 83fca9d

File tree

1 file changed

+22
-20
lines changed

1 file changed

+22
-20
lines changed
 

‎docs/stable/getting_started/slurm_setup.md

+22-20
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ This guide will help you get started with running ServerlessLLM on SLURM cluster
99
## Pre-requisites
1010
Before you begin, make sure you have checked the following:
1111
### Some Tips about Installation
12-
- Both installation and build require an internet connection. please make sure the port 443 on the job node you want to install is accessible.
1312
- If 'not enough disk space' is reported when `pip install` on the login node, you can submit it to a job node for execution
1413
```shell
1514
#!/bin/bash
@@ -37,19 +36,9 @@ Before you begin, make sure you have checked the following:
3736
pip install serverless-llm[worker]
3837
pip install serverless-llm-store
3938
```
40-
- Importantly, we recommend using [installing with pip](https://serverlessllm.github.io/docs/stable/getting_started/installation#installing-with-pip) if there is no CUDA driver on the node you want to install. If you want to [install from source](https://serverlessllm.github.io/docs/stable/getting_started/installation#installing-from-source), please make sure CUDA driver available on the node you want to install. Here are some commands to check it.
41-
```shell
42-
module avail cuda # if you can see some CUDA options, it means CUDA is available, then load the cuda module
43-
module load cuda-12.x # load specific CUDA version
44-
# or
45-
nvidia-smi # if you can see the GPU information, it means CUDA is available
46-
# or
47-
which nvcc # if you can see the CUDA compiler's path, it means CUDA is available
48-
```
49-
However, we **strongly recommend that you read the documentation for the HPC you are using** to find out how to check if the CUDA driver is available.
5039

51-
### Find nodes with sufficient computing power
52-
Consult the cluster documentation/administrator or run the following commands in the cluster to find a node with sufficient computing power (Compute Capacity > 7.0) ([Click here to check if your node has sufficient computing power](https://developer.nvidia.com/cuda-gpus#compute)).
40+
### Command for Querying GPU Resource Information
41+
Run the following commands in the cluster to check GPU resource information.
5342
```shell
5443
sinfo -O partition,nodelist,gres
5544
```
@@ -75,7 +64,7 @@ compute up 2 down infinite JobNode[16-17]
7564
### Job Nodes Setup
7665
**`srun` Node Selection**
7766

78-
Only one JobNode (with sufficient compute capability) is enough.
67+
Only one JobNode is enough.
7968

8069
**`sbatch` Node Selection**
8170

@@ -98,7 +87,20 @@ srun --partition <your-partition> --nodelist <JobNode> --gres <DEVICE>:1 --pty b
9887
```
9988
This command requests a session on the specified node and provides an interactive shell. `--gres <DEVICE>:1` specifies the GPU device you will use, for example: `--gres gpu:gtx_1060:1`
10089

101-
### Step 2: Prepare multiple windows with `tmux`
90+
### Step 2: Install from source
91+
Firstly, please make sure CUDA driver available on the node. Here are some commands to check it.
92+
```shell
93+
nvidia-smi
94+
95+
which nvcc
96+
```
97+
If `nvidia-smi` has listed GPU information, but `which nvcc` has no output. Then use the following commands to load `nvcc`. Here is an example that cuda is located at `/opt/cuda-12.2.0`
98+
```shell
99+
export PATH=/opt/cuda-12.2.0/bin:$PATH
100+
export LD_LIBRARY_PATH=/opt/cuda-12.2.0/lib64:$LD_LIBRARY_PATH
101+
```
102+
Then, following the [installation guide](./installation.md) to install from source.
103+
### Step 3: Prepare multiple windows with `tmux`
102104
Since srun provides a single interactive shell, you can use tmux to create multiple windows. Start a tmux session:
103105
```shell
104106
tmux
@@ -127,7 +129,7 @@ Once multiple windows are created, you can switch between them using:
127129
`Ctrl + B``W` (List all windows and select)
128130
`Ctrl + B`[Number] (Switch to a specific window, e.g., Ctrl + B → 1)
129131

130-
### Step 3: Run ServerlessLLM on the JobNode
132+
### Step 4: Run ServerlessLLM on the JobNode
131133
First find ports that are already occupied. Then pick your favourite number from the remaining ports to replace the following placeholder `<PORT>`. For example: `6379`
132134

133135
It should also be said that certain slurm system is a bit slow, **so please be patient and wait for the system to output**.
@@ -184,7 +186,7 @@ Expected output:
184186
{"id":"chatcmpl-9f812a40-6b96-4ef9-8584-0b8149892cb9","object":"chat.completion","created":1720021153,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}
185187
```
186188

187-
### Step 4: Clean up
189+
### Step 5: Clean up
188190
To delete a deployed model, use the following command:
189191
```shell
190192
sllm-cli delete facebook/opt-1.3b
@@ -265,7 +267,7 @@ Since the head node does not require a gpu, you can find a low-computing capacit
265267
Finding available port on JobNode01
266268
Available port: <avail_port>
267269
```
268-
Remember this <avail_port>, you will use it in Step 4
270+
Remember this `<avail_port>`, you will use it in Step 4
269271

270272
### Step 2: Start the Worker Node & Store
271273
We will start the worker node and store in the same script. Because the server loads the model weights onto the GPU and uses shared GPU memory to pass the pointer to the client. If you submit another script with ```#SBATCH --gpres=gpu:1```, it will be possibly set to use a different GPU, as specified by different ```CUDA_VISIBLE_DEVICES``` settings. Thus, they cannot pass the model weights.
@@ -275,7 +277,7 @@ We will start the worker node and store in the same script. Because the server l
275277
```shell
276278
#!/bin/sh
277279
#SBATCH --partition=your_partition
278-
#SBATCH --nodelist=JobNode02 # Note JobNode02 should have sufficient compute capacity
280+
#SBATCH --nodelist=JobNode02
279281
#SBATCH --gres=gpu:a6000:1 # Specify device on JobNode02
280282
#SBATCH --job-name=sllm-worker-store
281283
#SBATCH --output=sllm_worker.out
@@ -373,7 +375,7 @@ We will start the worker node and store in the same script. Because the server l
373375
$ conda activate sllm
374376
(sllm)$ export LLM_SERVER_URL=http://<HEAD_NODE_IP>:8343/
375377
```
376-
- Replace ```<HEAD_NODE_IP>``` with the actual IP address of the head node.
378+
- Replace `<HEAD_NODE_IP>` with the actual IP address of the head node.
377379
- Replace ```8343``` with the actual port number (`<avail_port>` in Step1) if you have changed it.
378380
2. **Deploy a Model Using ```sllm-cli```**
379381
```shell

0 commit comments

Comments
 (0)
Please sign in to comment.