[SYCL][CUDA] Reductions with 1024+ elements crash with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

**Describe the bug**

Reductions with the CUDA backend throw an `CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES` exception, even though there should be enough resources available for the reduction. On my setup any reduction with more than 640 `int` elements throws. I've tested this with `sum` and `max` as the reduction operator.

**To Reproduce**

The following code crashes with an `CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES` exception being thrown:

```cpp
#include <cassert>
#include <numeric>

#include <sycl/sycl.hpp>

int main() {
    constexpr auto REDUCTION_SIZE = 1024;

    auto queue = sycl::queue();

    sycl::buffer<int> valuesBuf{ REDUCTION_SIZE };
    {
        sycl::host_accessor a{ valuesBuf };
        std::iota(a.begin(), a.end(), 0);
    }

    sycl::buffer<int> maxBuf{ 1 };

    queue.submit([&](sycl::handler& cgh) {
        auto inputValues = valuesBuf.get_access<sycl::access_mode::read>(cgh);
        auto maxReduction = sycl::reduction(maxBuf, cgh, sycl::maximum<>());
        cgh.parallel_for(valuesBuf.get_range(), maxReduction,
                         [=](sycl::id<1> idx, auto& max) { max.combine(inputValues[idx]); });
    });

    assert(maxBuf.get_host_access()[0] == REDUCTION_SIZE - 1);
}
```

Output:

```
PI CUDA ERROR:
        Value:           701
        Name:            CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
        Description:     too many resources requested for launch
        Function:        cuda_piEnqueueKernelLaunch
        Source Location: /home/blackmark/dpcpp/llvm/sycl/plugins/cuda/pi_cuda.cpp:3055

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Aborted (core dumped)
```

Expected behavior would obviously be that reductions which should be possible given the hardware resources don't crash.

**Environment (please complete the following information):**

- OS: Ubuntu 22.04
- Target device and vendor: NVIDIA RTX 2070
- DPC++ version: ec348692e8d9272171a3204ecf7142c34607c9b3
- Dependencies version: CUDA 11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Reductions with 1024+ elements crash with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES #8414

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SYCL][CUDA] Reductions with 1024+ elements crash with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES #8414

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions