Skip to content

[SYCL][CUDA] Reductions with 1024+ elements crash with CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES #8414

Open
@BlackMark29A

Description

@BlackMark29A

Describe the bug

Reductions with the CUDA backend throw an CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES exception, even though there should be enough resources available for the reduction. On my setup any reduction with more than 640 int elements throws. I've tested this with sum and max as the reduction operator.

To Reproduce

The following code crashes with an CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES exception being thrown:

#include <cassert>
#include <numeric>

#include <sycl/sycl.hpp>

int main() {
    constexpr auto REDUCTION_SIZE = 1024;

    auto queue = sycl::queue();

    sycl::buffer<int> valuesBuf{ REDUCTION_SIZE };
    {
        sycl::host_accessor a{ valuesBuf };
        std::iota(a.begin(), a.end(), 0);
    }

    sycl::buffer<int> maxBuf{ 1 };

    queue.submit([&](sycl::handler& cgh) {
        auto inputValues = valuesBuf.get_access<sycl::access_mode::read>(cgh);
        auto maxReduction = sycl::reduction(maxBuf, cgh, sycl::maximum<>());
        cgh.parallel_for(valuesBuf.get_range(), maxReduction,
                         [=](sycl::id<1> idx, auto& max) { max.combine(inputValues[idx]); });
    });

    assert(maxBuf.get_host_access()[0] == REDUCTION_SIZE - 1);
}

Output:

PI CUDA ERROR:
        Value:           701
        Name:            CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
        Description:     too many resources requested for launch
        Function:        cuda_piEnqueueKernelLaunch
        Source Location: /home/blackmark/dpcpp/llvm/sycl/plugins/cuda/pi_cuda.cpp:3055

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Aborted (core dumped)

Expected behavior would obviously be that reductions which should be possible given the hardware resources don't crash.

Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • Target device and vendor: NVIDIA RTX 2070
  • DPC++ version: ec34869
  • Dependencies version: CUDA 11

Metadata

Metadata

Labels

bugSomething isn't workingcompilerCompiler related issueconfirmedcudaCUDA back-endruntimeRuntime library related issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions