-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问如何用VLLM部署33B #27
Comments
单机 A100 是几张卡?打开 CUDA_LAUNCH_BLOCKING=1 试试呢,报错在哪里? |
我部署后输出是乱码,有人遇到过吗 |
请问有vllm部署的教程吗?或者文件分享下文件 |
请问vllm部署时如何使用多卡加载模型,使用 |
try add |
thanks, I have solved it by set |
会报错啊,单机A100 ,torch 2.01, transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The text was updated successfully, but these errors were encountered: