We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a SYCL function for cudaOccupancyMaxActiveBlocksPerMultiprocessor ? some use cases are listed below. Thanks.
AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h: result = cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: cudart_result = cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags( AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: CUTLASS_TRACE_HOST(" cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags() returned error " << cudaGetErrorString(cudart_result)); AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/base_grouped.h: result = AITemplate/3rdparty/cub/cub/device/dispatch/dispatch_radix_sort.cuh: if (CubDebug(error = cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/3rdparty/cub/cub/util_device.cuh: return CubDebug(cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/python/aitemplate/backend/cuda/groupnorm/layer_norm.cuh: cudaError_t err = cudaOccupancyMaxActiveBlocksPerMultiprocessor( AITemplate/python/aitemplate/backend/cuda/layernorm_sigmoid_mul/layer_norm.cuh: cudaError_t err = = AITemplate/python/aitemplate/backend/cuda/softmax/softmax.cuh: cudaOccupancyMaxActiveBlocksPerMultiprocessor(
The text was updated successfully, but these errors were encountered:
Hi @jinz2014 . Working on this, I'll ping you when a PR is up. Thank you!
Sorry, something went wrong.
Thanks. This will enable the migration of variants of the function in the SYCL compiler.
This has been implemented in:
Which should make it possible to get the same information, we're also investigating some SYCL compat changes to make it easier to port from CUDA.
GeorgeWeb
No branches or pull requests
Is there a SYCL function for cudaOccupancyMaxActiveBlocksPerMultiprocessor ? some use cases are listed below. Thanks.
AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h: result = cudaOccupancyMaxActiveBlocksPerMultiprocessor(
AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: cudart_result = cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(
AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h: CUTLASS_TRACE_HOST(" cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags() returned error " << cudaGetErrorString(cudart_result));
AITemplate/3rdparty/cutlass/include/cutlass/gemm/device/base_grouped.h: result =
AITemplate/3rdparty/cub/cub/device/dispatch/dispatch_radix_sort.cuh: if (CubDebug(error = cudaOccupancyMaxActiveBlocksPerMultiprocessor(
AITemplate/3rdparty/cub/cub/util_device.cuh: return CubDebug(cudaOccupancyMaxActiveBlocksPerMultiprocessor(
AITemplate/python/aitemplate/backend/cuda/groupnorm/layer_norm.cuh: cudaError_t err = cudaOccupancyMaxActiveBlocksPerMultiprocessor(
AITemplate/python/aitemplate/backend/cuda/layernorm_sigmoid_mul/layer_norm.cuh: cudaError_t err = =
AITemplate/python/aitemplate/backend/cuda/softmax/softmax.cuh: cudaOccupancyMaxActiveBlocksPerMultiprocessor(
The text was updated successfully, but these errors were encountered: