Open
Description
Describe the bug
Reductions with the CUDA backend throw an CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
exception, even though there should be enough resources available for the reduction. On my setup any reduction with more than 640 int
elements throws. I've tested this with sum
and max
as the reduction operator.
To Reproduce
The following code crashes with an CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
exception being thrown:
#include <cassert>
#include <numeric>
#include <sycl/sycl.hpp>
int main() {
constexpr auto REDUCTION_SIZE = 1024;
auto queue = sycl::queue();
sycl::buffer<int> valuesBuf{ REDUCTION_SIZE };
{
sycl::host_accessor a{ valuesBuf };
std::iota(a.begin(), a.end(), 0);
}
sycl::buffer<int> maxBuf{ 1 };
queue.submit([&](sycl::handler& cgh) {
auto inputValues = valuesBuf.get_access<sycl::access_mode::read>(cgh);
auto maxReduction = sycl::reduction(maxBuf, cgh, sycl::maximum<>());
cgh.parallel_for(valuesBuf.get_range(), maxReduction,
[=](sycl::id<1> idx, auto& max) { max.combine(inputValues[idx]); });
});
assert(maxBuf.get_host_access()[0] == REDUCTION_SIZE - 1);
}
Output:
PI CUDA ERROR:
Value: 701
Name: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Description: too many resources requested for launch
Function: cuda_piEnqueueKernelLaunch
Source Location: /home/blackmark/dpcpp/llvm/sycl/plugins/cuda/pi_cuda.cpp:3055
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Aborted (core dumped)
Expected behavior would obviously be that reductions which should be possible given the hardware resources don't crash.
Environment (please complete the following information):
- OS: Ubuntu 22.04
- Target device and vendor: NVIDIA RTX 2070
- DPC++ version: ec34869
- Dependencies version: CUDA 11