Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问如何用VLLM部署33B #27

Open
laisun opened this issue Nov 10, 2023 · 6 comments
Open

请问如何用VLLM部署33B #27

laisun opened this issue Nov 10, 2023 · 6 comments

Comments

@laisun
Copy link

laisun commented Nov 10, 2023

会报错啊,单机A100 ,torch 2.01, transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@soloice
Copy link
Contributor

soloice commented Nov 28, 2023

单机 A100 是几张卡?打开 CUDA_LAUNCH_BLOCKING=1 试试呢,报错在哪里?

@FrankWhh
Copy link

我部署后输出是乱码,有人遇到过吗

@txy6666yr
Copy link

单机 A100 是几张卡?打开 CUDA_LAUNCH_BLOCKING=1 试试呢,报错在哪里?

请问有vllm部署的教程吗?或者文件分享下文件

@hyperbolic-c
Copy link

请问vllm部署时如何使用多卡加载模型,使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢

@mklf
Copy link

mklf commented Apr 12, 2024

请问vllm部署时如何使用多卡加载模型,使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢

try add --tp=2 to launch argument

@hyperbolic-c
Copy link

请问vllm部署时如何使用多卡加载模型,使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢

try add --tp=2 to launch argument

thanks, I have solved it by set --tensor-parallel-size >1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants