Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: cuda.cooperative passes string dtype parameters through to C++ #3912

Open
1 task done
brycelelbach opened this issue Feb 22, 2025 · 0 comments
Open
1 task done
Assignees
Labels
bug Something isn't working right. cuda.cooperative For all items related to the cuda.cooperative Python module

Comments

@brycelelbach
Copy link
Collaborator

brycelelbach commented Feb 22, 2025

Is this a duplicate?

Type of Bug

Compile-time Error

Component

cuda.cooperative

Describe the bug

If you pass a string as dtype to cuda.cooperative things, it passes the string through to the C++ code. Is this intended?

At the very least, we shouldn't just pass through for builtin Numba/Numpy dtypes. We should make dtype="int32" work as it does in NumPy, not fail horribly as it does today.

How to Reproduce

import cuda.cooperative.experimental as cudax
import numba
from pynvjitlink import patch
patch.patch_numba_linker(lto=True)

load = cudax.block.load(dtype=numba.int32, threads_in_block=32, items_per_thread=4, algorithm="striped")
store = cudax.block.store(dtype="int32", threads_in_block=32, items_per_thread=4, algorithm="striped")

@numba.cuda.jit(link=load.files+store.files)
def kernel():
  pass

kernel[1, 32]()

Today this leads to:

Traceback (most recent call last):
  File "/raid/blelbach/dev/sandbox/py/bugs/cuda_cooperative_string_dtype.py", line 7, in <module>
    store = cudax.block.store(dtype="int32", threads_in_block=32, items_per_thread=4, algorithm="striped")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/block/_block_load_store.py", line 204, in store
    for ltoir in specialization.get_lto_ir()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_types.py", line 726, in get_lto_ir
    _, lto_fn = nvrtc.compile(cpp=src, cc=cc, rdc=True, code="lto")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py", line 89, in compile
    return nvrtc_version, compile_impl(
                          ^^^^^^^^^^^^^
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_caching.py", line 46, in cacher
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py", line 55, in compile_impl
    CHECK_NVRTC(err, prog)
  File "/raid/blelbach/dev/cccl/python/cuda_cooperative/cuda/cooperative/experimental/_nvrtc.py", line 17, in CHECK_NVRTC
    raise RuntimeError(f"NVRTC error: {log.decode('ascii')}")
RuntimeError: NVRTC error: code.cu(12): error: identifier "int32" is undefined
              using algorithm_t = cub::BlockStore<int32, 32, 4, ::cub::BLOCK_STORE_STRIPED, 1, 1>;
                                                  ^

code.cu(16): error: incomplete type is not allowed
                  extern "C" __device__ void block_storeint32__32__4____cub__BLOCK_STORE_STRIPED__1__1_Puint8_Pint32_Pint32_alloc(storage_t* param_0, storage_t (&param_1)[4])
                                             ^

code.cu(16): error: identifier "storage_t" is undefined
                  extern "C" __device__ void block_storeint32__32__4____cub__BLOCK_STORE_STRIPED__1__1_Puint8_Pint32_Pint32_alloc(storage_t* param_0, storage_t (&param_1)[4])
                                                                                                                                  ^

code.cu(16): error: identifier "param_0" is undefined
                  extern "C" __device__ void block_storeint32__32__4____cub__BLOCK_STORE_STRIPED__1__1_Puint8_Pint32_Pint32_alloc(storage_t* param_0, storage_t (&param_1)[4])
                                                                                                                                             ^

code.cu(16): error: identifier "param_1" is undefined
                  extern "C" __device__ void block_storeint32__32__4____cub__BLOCK_STORE_STRIPED__1__1_Puint8_Pint32_Pint32_alloc(storage_t* param_0, storage_t (&param_1)[4])
                                                                                                                                                                  ^

code.cu(17): error: expected a ";"
                  {
                  ^

6 errors detected in the compilation of "code.cu".

Expected behavior

N/A

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@brycelelbach brycelelbach added the bug Something isn't working right. label Feb 22, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Feb 22, 2025
@brycelelbach brycelelbach added the cuda.cooperative For all items related to the cuda.cooperative Python module label Feb 22, 2025
@tpn tpn self-assigned this Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. cuda.cooperative For all items related to the cuda.cooperative Python module
Projects
Status: Todo
Development

No branches or pull requests

2 participants