sync : ggml #12732

ggerganov · 2025-04-03T07:50:08Z

No description provided.

ggerganov · 2025-04-03T08:31:31Z

@cmdr2 Could you take a look at the arm build failures?

cmdr2 · 2025-04-03T08:35:07Z

@ggerganov Sure, taking a look

cmdr2 · 2025-04-03T10:22:55Z

tl;dr - Maybe a difference in the strictness of the C++ compiler (vs compiling the C file)?

Interestingly, this isn't new behavior. The previous CI runs for this runner also raises this (as a warning) (i.e. without this PR's change) - https://github.com/ggml-org/llama.cpp/actions/runs/14237462102/job/39899497124#step:9:87

src/ggml-cpu/ggml-cpu.c:1586:21: warning: incompatible pointer types assigning to 'const __fp16 *' from 'const ggml_fp16_internal_t *' (aka 'const unsigned short *') [-Wincompatible-pointer-types]
 1586 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);

But the CI runner now raises this as a warning as well as an error, after these lines are part of C++.

src/ggml-cpu\vec.h:118:25: error: incompatible pointer types assigning to 'const __fp16 *' from 'const ggml_fp16_internal_t *' (aka 'const unsigned short *')
  118 |                 ax[j] = GGML_F16_VEC_LOAD(x[k] + i + j*GGML_F16_EPR, j);

Maybe a difference in the strictness of the C++ compiler (vs compiling the C file)?

Continuing to investigate.

cmdr2 · 2025-04-03T10:33:52Z

tl;dr - This warning is justified, and worth fixing anyway. Rather than trying to coerce the compiler into letting this through, it's probably worth investigating this (erstwhile) warning.

Details:
On MSVC (or LLVM with MSVC), the code defines ggml_fp16_internal_t as uint16_t (unsigned short) -

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h

Lines 85 to 99 in 193c3e0

    
           #include <arm_neon.h> 
        
           #ifdef _MSC_VER 
        
           typedef uint16_t ggml_fp16_internal_t; 
        
           #define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) } 
        
           #else 
        
           typedef __fp16 ggml_fp16_internal_t; 
        
           #define ggml_vld1q_u32(w,x,y,z) { (w), (x), (y), (z) } 
        
           #endif // _MSC_VER

It very clearly DOES NOT define ggml_fp16_internal_t as __fp16.

So I'm not sure where __fp16 comes into the picture (as mentioned in the error)? Does the arm-neon.h implementation actually require __fp16 as the arg type? The stack trace below is interesting:

D:/a/llama.cpp/llama.cpp/ggml/src/ggml-cpu\vec.h:118:25: error: incompatible pointer types assigning to 'const __fp16 *' from 'const ggml_fp16_internal_t *' (aka 'const unsigned short *')
  118 |                 ax[j] = GGML_F16_VEC_LOAD(x[k] + i + j*GGML_F16_EPR, j);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
D:/a/llama.cpp/llama.cpp/ggml/src/ggml-cpu\simd-mappings.h:127:41: note: expanded from macro 'GGML_F16_VEC_LOAD'
  127 |     #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx4_LOAD(p)
      |                                         ^~~~~~~~~~~~~~~~~~~
D:/a/llama.cpp/llama.cpp/ggml/src/ggml-cpu\simd-mappings.h:117:60: note: expanded from macro 'GGML_F32Cx4_LOAD'
  117 |     #define GGML_F32Cx4_LOAD(x)      vcvt_f32_f16(vld1_f16((const ggml_fp16_internal_t *)(x)))
      |                                                   ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\Program Files\LLVM\lib\clang\18\include\arm_neon.h:39354:47: note: expanded from macro 'vld1_f16'
 39354 |   __ret = (float16x4_t) __builtin_neon_vld1_v(__p0, 8); \
       |                                               ^~~~

Will look at the arm header file. Continuing to investigate..

cmdr2 · 2025-04-03T10:51:26Z

Interesting, I see a note in our code:

// 16-bit float
// on Arm, we use __fp16
// on x86, we use uint16_t

But we're not running on x86, yet we're using uint16_t, simply because the current implementation checks for MSVC and defines it as uint16_t.

@ggerganov @slaren Shouldn't we also explicitly check for x86, along with the msvc check? Before defining ggml_fp16_internal_t as uint16_t?

cmdr2 · 2025-04-03T10:57:28Z

I'm a bit confused now. Arm neon isn't x86. So why does it set ggml_fp16_internal_t to uint16_t inside the #if defined(__ARM_NEON) block, if it is MSVC?

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h

Lines 79 to 95 in 193c3e0

    
           #if defined(__ARM_NEON) 
        
           // if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example: 
        
           // 
        
           //   $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/ 
        
           // 
        
           #include <arm_neon.h> 
        
           #ifdef _MSC_VER 
        
           typedef uint16_t ggml_fp16_internal_t; 
        
           #define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) } 
        
           #else 
        
           typedef __fp16 ggml_fp16_internal_t;

We're also doing this in ggml-impl.h:

llama.cpp/ggml/src/ggml-impl.h

Lines 314 to 320 in 193c3e0

    
           #if defined(__ARM_NEON) 
        
               #if defined(_MSC_VER) || (defined(__CUDACC__) && __CUDACC_VER_MAJOR__ <= 11) 
        
                   typedef uint16_t ggml_fp16_internal_t; 
        
               #else 
        
                   typedef __fp16 ggml_fp16_internal_t; 
        
               #endif 
        
           #endif

ggerganov · 2025-04-03T12:06:25Z

I'm a bit confused now. Arm neon isn't x86. So why does it set ggml_fp16_internal_t to uint16_t inside the #if defined(__ARM_NEON) block, if it is MSVC?

I'm not really sure as well. I think the history of this is #5404 and the references there in.

It appears that MSVC on Arm64 had some issues (maybe it still has them) and this required the hacky ggml_vld1q_u32 macro, which in turn leads to using uint16_t instead of the native __fp16.

cmdr2 · 2025-04-04T04:44:21Z

Thanks, digging further in that direction..

cmdr2 · 2025-04-04T08:04:36Z

Submitted a PR for this - ggml-org/ggml#1176

Yeah, it looks like we need to do things in uint16_t for msvc+arm neon. Traced back to the original PR for ARM64 support for Windows - b52b29a

So maybe the solution is to just tell the compiler that it's okay (by using reinterpret_cast if we're in the C++ compiler)? I've proposed that change in my PR.

I have no idea about ARM or SIMD code, so please feel free to suggest alternatives :) Thanks!

cmdr2 · 2025-04-07T09:39:44Z

@ggerganov What if we keep the arm header include the same (i.e. if not MUSA), but continue allowing the fp32 conversion function declarations for MUSA?

Basically, bringing back these lines: e638450#diff-1f56ac82eed1293d4aa7c35aef0bc19e831cdb24dcb6af43582143936eb7eae4L19-L25

And removing && !defined(__MUSACC__) from this line (to match what used to happen previously): e638450#diff-1f56ac82eed1293d4aa7c35aef0bc19e831cdb24dcb6af43582143936eb7eae4R313

Thanks

cmdr2 · 2025-04-07T09:41:14Z

Or maybe that's completely unrelated to the CI failure?

slaren · 2025-04-07T09:41:38Z

Yes, it is due to the CUDA BF16 change.

ggerganov · 2025-04-07T10:52:51Z

The MUSA build is simple to fix, but I am not able to fix the HIP build. I think HIP does not support cublasGemmEx with BF16 type.

@JohannesGaessler Do you have suggestions how to fix?

JohannesGaessler · 2025-04-07T12:01:26Z

I pushed a fix for the HIP compilation failure. From what I can tell the problem is that the HIP header does not define __bfloat162float which is used in the CUDA code to do the type conversion. However, both the HIP and CUDA headers define the float() operator which in the case of CUDA is just a wrapper for __bfloat162float. More generally, I think the code can be simplified by just always explicitly casting the source data to float. For float <-> half and float <-> nv_bfloat16 that should be a no-op and for half <-> nv_bfloat16 that is what the code was already doing.

I will not be able to test the code on actual AMD hardware until in a few hours.

JohannesGaessler · 2025-04-07T12:17:13Z

Sorry, I missed the error messages about the HIP types. In hipblas.h the compute types are defined as follows:

#define HIPBLAS_R_16F HIP_R_16F
#define HIPBLAS_R_32F HIP_R_32F
#define HIPBLAS_R_64F HIP_R_64F
#define HIPBLAS_C_16F HIP_C_16F
#define HIPBLAS_C_32F HIP_C_32F
#define HIPBLAS_C_64F HIP_C_64F
#define HIPBLAS_R_8I HIP_R_8I
#define HIPBLAS_R_8U HIP_R_8U
#define HIPBLAS_R_32I HIP_R_32I
#define HIPBLAS_R_32U HIP_R_32U
#define HIPBLAS_C_8I HIP_C_8I
#define HIPBLAS_C_8U HIP_C_8U
#define HIPBLAS_C_32I HIP_C_32I
#define HIPBLAS_C_32U HIP_C_32U
#define HIPBLAS_R_16B HIP_R_16BF
#define HIPBLAS_C_16B HIP_C_16BF

It also lists the following types as supported for GEMM:

  |   aType    |   bType    |   cType    |     computeType     |
  | ---------- | ---------- | ---------- | ------------------- |
  | HIP_R_16F  | HIP_R_16F  | HIP_R_16F  | HIPBLAS_COMPUTE_16F |
  | HIP_R_16F  | HIP_R_16F  | HIP_R_16F  | HIPBLAS_COMPUTE_32F |
  | HIP_R_16F  | HIP_R_16F  | HIP_R_32F  | HIPBLAS_COMPUTE_32F |
  | HIP_R_16BF | HIP_R_16BF | HIP_R_16BF | HIPBLAS_COMPUTE_32F |
  | HIP_R_16BF | HIP_R_16BF | HIP_R_32F  | HIPBLAS_COMPUTE_32F |
  | HIP_R_32F  | HIP_R_32F  | HIP_R_32F  | HIPBLAS_COMPUTE_32F |
  | HIP_R_64F  | HIP_R_64F  | HIP_R_64F  | HIPBLAS_COMPUTE_64F |
  | HIP_R_8I   | HIP_R_8I   | HIP_R_32I  | HIPBLAS_COMPUTE_32I |
  | HIP_C_32F  | HIP_C_32F  | HIP_C_32F  | HIPBLAS_COMPUTE_32F |
  | HIP_C_64F  | HIP_C_64F  | HIP_C_64F  | HIPBLAS_COMPUTE_64F |

So it should be possible to use the code for HIP by just changing the vendor headers in ggml.

ggerganov · 2025-04-07T12:20:39Z

I had this change, which fixes MUSA: 5ef588b

But HIP still fails like this:

https://github.com/ggml-org/llama.cpp/actions/runs/14306928482/job/40092833687#step:6:128

[ 17%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/im2col.cu.o
/__w/llama.cpp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1217:50: error: use of undeclared identifier 'HIPBLAS_R_16BF'
                    &alpha_f32,  src0_ptr,       CUDA_R_16BF, ne00,

Not sure why the HIPBLAS_R_16BF identifier is not recognized after my change.

JohannesGaessler · 2025-04-07T12:27:33Z

Because they're calling the type HIPBLAS_R_16B instead of HIPBLAS_R_16BF.

ggerganov · 2025-04-07T12:32:11Z

Ok, let me push a fix now.

… (ggml/1167) * cpu: refactor SIMD mappings and vectorized op functions into separate files * Fix warning for ggml_float to float * Fix warnings * cpu: move all the operations (except mul_mat) to a separate c++ file * fix whitespace * Update ggml/src/ggml-cpu/vec.h Co-authored-by: Diego Devesa <[email protected]> * Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp * Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously --------- Co-authored-by: Diego Devesa <[email protected]>

* add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr

* ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci

ggml-ci

Fix ggml-org#12732: * Remove incorrect inclusion of "arm_neon.h" for CUDA versions ≥ 12

sandrohanea · 2025-04-24T11:55:31Z

Hello @ggerganov ,

Would it be possible to push these changes to whisper.cpp as well, please?

Latest master on whisper.cpp is failing for ARM64 because of Incompatible pointer types assigning to 'const __fp16 *' from 'const ggml_fp16_internal_t

I see that last sync: ggml happned 3 weeks ago here: ggml-org/whisper.cpp@cbde66d

So I belive these fixes didn't make it there.

Thanks a lot!

github-actions bot added script Script related ggml changes relating to the ggml tensor library for machine learning labels Apr 3, 2025

cmdr2 mentioned this pull request Apr 4, 2025

cpu: fix compile errors on arm+msvc (windows) for pointer casts ggml-org/ggml#1176

Closed

ggerganov mentioned this pull request Apr 4, 2025

ggml : simplify Arm fp16 CPU logic ggml-org/ggml#1177

Merged

ggerganov force-pushed the sync-ggml-25-04-03 branch 3 times, most recently from 8339981 to 6232cee Compare April 7, 2025 09:32

github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Apr 7, 2025

ggerganov force-pushed the sync-ggml-25-04-03 branch 2 times, most recently from b65ac0b to 709fa72 Compare April 7, 2025 12:35

cmdr2 and others added 2 commits April 7, 2025 15:38

CUDA: don't convert BF16 weights to FP32 (ggml/1174)

8afae83

* add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr

ggerganov added 3 commits April 7, 2025 15:38

ggml : simplify Arm fp16 CPU logic (ggml/1177)

d069261

* ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci

sync : ggml

d2d70e6

ggml-ci

cuda : fix HIP and MUSA BF16 (#0)

92d7d4d

ggml-ci

ggerganov force-pushed the sync-ggml-25-04-03 branch from 709fa72 to 92d7d4d Compare April 7, 2025 12:39

ggerganov merged commit 1a1ab7e into master Apr 7, 2025
56 of 59 checks passed

ggerganov deleted the sync-ggml-25-04-03 branch April 7, 2025 15:44

zhengjun-xing pushed a commit to zhengjun-xing/llama.cpp that referenced this pull request Apr 10, 2025

ggml-impl.h: Fix build issues on AArch64 with CUDA version 12

725be9c

Fix ggml-org#12732: * Remove incorrect inclusion of "arm_neon.h" for CUDA versions ≥ 12

zhengjun-xing mentioned this pull request Apr 10, 2025

ggml-impl.h: Fix build issues on AArch64 with CUDA version 12 #12872

Closed

zhengjun-xing added a commit to zhengjun-xing/llama.cpp that referenced this pull request Apr 10, 2025

ggml-impl.h: Fix build issues on AArch64 with CUDA version 12

d85170b

Fix ggml-org#12732: * Remove incorrect inclusion of "arm_neon.h" for CUDA versions ≥ 12

sync : ggml #12732

sync : ggml #12732

Uh oh!

Conversation

ggerganov commented Apr 3, 2025

Uh oh!

ggerganov commented Apr 3, 2025

Uh oh!

cmdr2 commented Apr 3, 2025

Uh oh!

cmdr2 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 commented Apr 3, 2025

Uh oh!

cmdr2 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 commented Apr 3, 2025

Uh oh!

ggerganov commented Apr 3, 2025

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 commented Apr 7, 2025

Uh oh!

cmdr2 commented Apr 7, 2025

Uh oh!

slaren commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 7, 2025

Uh oh!

JohannesGaessler commented Apr 7, 2025

Uh oh!

JohannesGaessler commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 7, 2025

Uh oh!

JohannesGaessler commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 7, 2025

Uh oh!

Uh oh!

sandrohanea commented Apr 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cmdr2 commented Apr 3, 2025 •

edited

Loading

cmdr2 commented Apr 3, 2025 •

edited

Loading

cmdr2 commented Apr 4, 2025 •

edited

Loading