CUDA/HIP header-only library that streamlines working with vector and low-precision floating-point types (16 bit, 8 bit) on GPUs
performance cpp gpu cuda kernel-tuner hip vectorization floating-point half-precision mixed-precision low-precision bfloat16 header-only-library reduced-precision
-
Updated
Jan 27, 2025 - C++