[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator #1405

josephydu · 2024-09-12T11:46:12Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

I run the same benchmark script in the following two commits:
old: cb99ba4
new:c33d82a
I run it failed in the new commit but sccussed in the old commit.
I get the following error output:

Reproduction

server:
python3 -m sglang.launch_server --model-path Qwen/Qwen2-7B --host 127.0.0.1 --port 8080 --mem-fraction-static 0.8 --dp-size 2 --load-balance-method round_robin
benchmark:
python3 -m sglang.bench_serving --backend sglang --host 127.0.0.1 --port 8080 --dataset-name random --tokenizer Qwen/Qwen2-7B --model Qwen/Qwen2-7B --random-output-len 1024 --random-input-len 4096 --random-range-ratio 0.5 --seed 1234 --request-rate 15.7 --num-prompts 200

Environment

I run the script in A100 40G 8GPU

The text was updated successfully, but these errors were encountered:

merrymercy · 2024-09-12T23:02:48Z

cc @yzh119 @zhyncs

merrymercy · 2024-09-12T23:03:16Z

We will take a look soon. In the meanwhile, you can try to increase this value

sglang/python/sglang/global_config.py

Line 26 in c33d82a

self.flashinfer_workspace_size = 384 * 1024 * 1024

zhyncs · 2024-09-12T23:04:02Z

Ok I’ll take a look asap

merrymercy · 2024-09-22T11:41:59Z

@josephydu Can you try it again with sglang v0.3.1.post3?

I run the same command on 8xH100 and did not find any issues.

York-Cheung · 2024-09-25T07:05:55Z

Same. I use 2 A100, sglang v0.3.1.post3, and cuda graph disabled.

josephydu · 2024-09-26T04:57:09Z

@josephydu Can you try it again with sglang v0.3.1.post3?

I run the same command on 8xH100 and did not find any issues.

I still got the problem in 8xA100. But when I try to increase flashinfer_workspace_size to 384 * 1024 * 1024 * 2, it works.
However, I still don't understand why in the old version the default value for this flashinfer_workspace_size only needs to be 192 * 1024 * 1024, but in the new version it needs to be 384 * 1024 * 1024

dmakhervaks · 2024-09-26T20:41:14Z

@merrymercy I am also getting the same issue when running llama 405B FP8 from neuralmagic on 8h100s

This is how I launch the server: python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix

and this is the error i get:

"RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 550502400 and alignment 16 in AlignedAllocator"

I get the same error with the following command variations as well

python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --disable-mla

python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --disable-mla --disable-cuda-graph

python3 -m sglang.launch_server --model /projects/xlab/ZLM/models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --mem-fraction-static 0.7

I checked and this problem does not happen in 0.2.7, but 0.2.14 and onwards it does.

Not sure about in between 0.2.7 and 0.2.14

josephydu · 2024-09-27T02:18:45Z

@merrymercy I am also getting the same issue when running llama 405B FP8 from neuralmagic on 8h100s

This is how I launch the server: python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix

and this is the error i get:

"RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 550502400 and alignment 16 in AlignedAllocator"

I get the same error with the following command variations as well

python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --disable-mla

python3 -m sglang.launch_server --model /models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --disable-mla --disable-cuda-graph

python3 -m sglang.launch_server --model /projects/xlab/ZLM/models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix --mem-fraction-static 0.7

I checked and this problem does not happen in 0.2.7, but 0.2.14 and onwards it does.

Not sure about in between 0.2.7 and 0.2.14

Maybe you can try to increse flashinfer_workspace_size . It can temporarily solve the problem, but the real reason is still unknown.
sglang/python/sglang/global_config.py
self.flashinfer_workspace_size = 384 * 1024 * 1024

dmakhervaks · 2024-09-27T03:44:22Z

@josephydu I think I found a pattern, which may help you in debugging this.

0.3.0 and up, if I remove "disable-radix-cache", I do not get the error.

i..e if run this:

python3 -m sglang.launch_server --model /projects/xlab/ZLM/models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8

instead of

python3 -m sglang.launch_server --model /projects/xlab/ZLM/models/neuralmagic-Meta-Llama-3.1-405B-Instruct-FP8/ --tp 8 --disable-radix-cache

changing size of flashinfer_workspace_size gave me a different issue

yukavio mentioned this issue Sep 12, 2024

Flex scheduler #1142

Closed

4 tasks

zhyncs self-assigned this Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator #1405

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator #1405

josephydu commented Sep 12, 2024

merrymercy commented Sep 12, 2024

merrymercy commented Sep 12, 2024

zhyncs commented Sep 12, 2024

merrymercy commented Sep 22, 2024

York-Cheung commented Sep 25, 2024 •

edited

Loading

josephydu commented Sep 26, 2024

dmakhervaks commented Sep 26, 2024 •

edited

Loading

josephydu commented Sep 27, 2024

dmakhervaks commented Sep 27, 2024 •

edited

Loading

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator #1405

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator #1405

Comments

josephydu commented Sep 12, 2024

Checklist

Describe the bug

Reproduction

Environment

merrymercy commented Sep 12, 2024

merrymercy commented Sep 12, 2024

zhyncs commented Sep 12, 2024

merrymercy commented Sep 22, 2024

York-Cheung commented Sep 25, 2024 • edited Loading

josephydu commented Sep 26, 2024

dmakhervaks commented Sep 26, 2024 • edited Loading

josephydu commented Sep 27, 2024

dmakhervaks commented Sep 27, 2024 • edited Loading

York-Cheung commented Sep 25, 2024 •

edited

Loading

dmakhervaks commented Sep 26, 2024 •

edited

Loading

dmakhervaks commented Sep 27, 2024 •

edited

Loading