Skip to content

Commit 1a8f132

Browse files
pavanbalajimeta-codesync[bot]
authored andcommitted
Improve error message for unsupported NCCL mem allocator
Summary: Updated error messages in TorchCommNCCL and TorchCommNCCLX to include the CUDART version number when the NCCL memory allocator is not supported. This provides better diagnostic information to users, making it clearer which CUDA runtime version is causing the incompatibility. The error message now displays "NCCL mem allocator is not supported in CUDART version %d" instead of the generic "not supported in this NCCL version" message. Reviewed By: siyengar Differential Revision: D88564927 fbshipit-source-id: 2daf1f8bc6435ba0612af8c25ee5e17ad7a5f65f
1 parent 795248a commit 1a8f132

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

comms/torchcomms/nccl/TorchCommNCCL.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1435,7 +1435,9 @@ std::shared_ptr<c10::Allocator> TorchCommNCCL::getMemAllocator() {
14351435
c10::DeviceIndex deviceIdx = device_.index();
14361436
if (!deviceSupportsMulticast(deviceIdx)) {
14371437
TORCH_CHECK(
1438-
false, "NCCL mem allocator is not supported in this NCCL version");
1438+
false,
1439+
"NCCL mem allocator is not supported in CUDART version %d",
1440+
CUDART_VERSION);
14391441
}
14401442
static std::shared_ptr<c10::cuda::CUDACachingAllocator::CUDAAllocator>
14411443
ncclMemAllocator =

comms/torchcomms/ncclx/TorchCommNCCLX.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1776,7 +1776,9 @@ std::shared_ptr<c10::Allocator> TorchCommNCCLX::getMemAllocator() {
17761776
c10::DeviceIndex deviceIdx = device_.index();
17771777
if (!deviceSupportsMulticast(deviceIdx)) {
17781778
TORCH_CHECK(
1779-
false, "NCCL mem allocator is not supported in this NCCL version");
1779+
false,
1780+
"NCCLX mem allocator is not supported in CUDART version %d",
1781+
CUDART_VERSION);
17801782
}
17811783
static std::shared_ptr<c10::cuda::CUDACachingAllocator::CUDAAllocator>
17821784
ncclMemAllocator =

0 commit comments

Comments
 (0)