Skip to content

Conversation

@studyingeugene
Copy link

@studyingeugene studyingeugene commented Dec 11, 2025

I treated this as a small bug fix rather than a feature addition, so I submitted a PR directly without an issue.
Apologies if that’s against the usual workflow. And I’ll be glad to open an issue if preferred.

Summary

This PR fixes two CUDA compilation issues in the inference extensions:

  1. Removes C++17 structured bindings from kernel launch parameter handling (kernel.cu) to avoid nvcc capture errors. (Relevant Issue : Failed to build C++ code for customized CUDA kernels #88)
  2. Replaces overly-generic comparison operator templates in common.h with type-specific overloads, preventing invalid instantiation with non-vector types (e.g., std::atomic<int>).

These changes significantly improve compatibility with nvcc’s partial C++17 support and ensure the CUDA extensions build reliably across different environments.


1. Remove structured bindings from kernel launch code (kernel.cu)

Problem

kernel.cu used structured bindings:

auto [blockDim, gridDim, stream, useVec, biasSafe, N, HW] =
    get_kernel_launch_info<vec_t>(y);

nvcc has incomplete support for capturing structured-binding variables inside lambda functions or kernel-launch expressions, and this frequently leads to compilation errors in CUDA extension code.

In my case, nvcc fails with: error_log_1.txt

error: structured binding cannot be captured

This is due to incomplete support for capturing structured bindings in nvcc's C++17 implementation.

Fix

Structured bindings are replaced with explicit tuple unpacking:

const auto launch_info = get_kernel_launch_info<vec_t>(y);
const dim3& blockDim = std::get<0>(launch_info);
const dim3& gridDim  = std::get<1>(launch_info);
const auto& stream   = std::get<2>(launch_info);
const bool  useVec   = std::get<3>(launch_info);
const bool  biasSafe = std::get<4>(launch_info);
const int   N        = std::get<5>(launch_info);
const int   HW       = std::get<6>(launch_info);

This avoids the nvcc limitation while preserving identical functionality.

2. Fix comparison operator template in common.h

Problem

common.h defined a generic comparison operator template:

template <typename T1, typename T2>
__forceinline__ __device__ bool4 operator>(const T1& a, const T2& b) {
    return make_vec4(a.x > b, a.y > b, a.z > b, a.w > b);
}

Because this template matched any type, nvcc attempted to instantiate it for types that do not contain .x/.y/.z/.w, such as:

  • std::atomic
  • other pybind11 internal types

This produced errors like: error_log_2.txt

error: class "std::atomic<int>" has no member "x"

Fix

The generic template is removed and replaced with explicit overloads for supported vector types:

__forceinline__ __device__ bool4 operator>(const float4& a, const float b) {
    return make_vec4(a.x > b, a.y > b, a.z > b, a.w > b);
}

__forceinline__ __device__ bool4 operator>(const Half4& a, const c10::Half& b) {
    return make_vec4(a.x > b, a.y > b, a.z > b, a.w > b);
}

This prevents invalid instantiation and ensures correct operator behavior.

Safety

  • No functional or numerical logic was changed.
  • Kernel launch behavior is identical (same block/grid dimensions, streams, flags).
  • The operator overload fix only eliminates unintended template matches.
  • Execution results (encode/decode paths) match prior behavior.

Testing

closing

I appreciate your time reviewing my PR. Thanks

@studyingeugene
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant