cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

cmdr2 · 2025-04-04T07:59:09Z

This PR attempts to fix the CI build failure in ggml-org/llama.cpp#12732

The underlying arm functions actually expect __fp16 *, so this PR attempts to only match the expected function signature. E.g. ARM reference doc for vld1q_f16() and vst1_f16().

It does not affect the actual computation. The computation (before this function is called) will still be performed as uint16_t on MSVC+arm neon. For e.g. the fp32-to-fp16 code will continue to pack the data in a uint16_t -

ggml/src/ggml-impl.h

Lines 334 to 338 in dddef73

    
           static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) { 
        
               ggml_fp16_t res; 
        
               ggml_fp16_internal_t tmp = f; 
        
               memcpy(&res, &tmp, sizeof(ggml_fp16_t)); 
        
               return res;

And the bitshifting hack from ggml-org/llama.cpp#5404 will continue being used (in ggml-cpu-quants.c) -

ggml/src/ggml-cpu/ggml-cpu-impl.h

Line 91 in dddef73

    
           #define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) }

I don't think this will affect the runtime performance, since reinterpret_cast is compile-time.

Given that I've never dealt with ARM code before, please let me know if this needs any changes. Thanks! :)

cmdr2 · 2025-04-04T08:00:32Z

The runner on ggml-ci passes, but of course that doesn't run the arm+msvc scenario - 94cbdad

ggerganov · 2025-04-04T08:16:03Z

The change looks OK to me - let's see how the CI in llama.cpp goes.

The computation (before this function is called) will still be performed as uint16_t on MSVC+arm neon. For e.g. the fp32-to-fp16 code will continue to pack the data in a uint16_t -

I think the code that you referenced is executed on Arm without MSVC because of the !defined(_MSC_VER) on line 322. Do you agree?

ggerganov · 2025-04-04T08:47:07Z

The CI has passed.

However, I took a more detailed look at this code now, and I think there is a simpler solutions: ggml-org/llama.cpp@1b07edf

The CUDA checks seem like something that was needed in the past, before declaring ggml_fp16_t to by uint16_t. I don't think these are relevant any more, so we can remove them.
The MSVC ggml_vld1q_u32 hack does not actually require typedef uint16_t ggml_fp16_internal_t; as I initially thought, because this instruction works with uint32_t / uint64_t arguments. So I don't see a reason not to always typedef __fp16 ggml_fp16_internal_t;, regardless if MSVC is used or not.

Waiting for the llama.cpp CI to pass for this alternative solution (https://github.com/ggml-org/llama.cpp/actions/runs/14261396774) and if it works, will proceed with it.

cmdr2 · 2025-04-04T08:56:25Z

I think the code that you referenced is executed on Arm without MSVC because of the !defined(_MSC_VER) on line 322. Do you agree?

Ah yeah, I missed that. You're right, and I agree with the suggestion, because ggml_fp16_internal_t isn't actually used anywhere else.

In your WIP commit, are you missing a !defined(__MSC_VER) in the #if condition? ggml-org/llama.cpp@1b07edf#diff-1f56ac82eed1293d4aa7c35aef0bc19e831cdb24dcb6af43582143936eb7eae4L488

#endif // defined(__ARM_NEON) && (!defined(__MSC_VER)

cmdr2 · 2025-04-04T08:59:13Z

And if ggml_fp16_internal_t isn't actually used anywhere, why not simplify and use __fp16 directly in the WIP commit?

ggerganov · 2025-04-04T09:21:36Z

In your WIP commit, are you missing a !defined(__MSC_VER) in the #if condition?

The MSVC-specific hack for ggml_vld1q_u32 does not seem related to the way the FP16 data is handled, so I don't see a need to distinguish MSVC vs no-MSVC Arm builds anymore in ggml-impl.h.

cmdr2 · 2025-04-04T09:30:15Z

Right, but previously the ggml_compute_fp16_to_fp32 function would use the pure reference implementation for MSVC+Neon. But after this change, wouldn't MSVC+Neon also attempt to use this implementation (with __fp16)? Will that work?

cmdr2 · 2025-04-04T12:12:59Z

Closing this in favor of #1177

cpu: fix compile errors on arm+msvc (windows) for pointer casts

64639f5

cmdr2 mentioned this pull request Apr 4, 2025

sync : ggml ggml-org/llama.cpp#12732

Merged

ggerganov mentioned this pull request Apr 4, 2025

ggml : simplify Arm fp16 CPU logic #1177

Merged

cmdr2 closed this Apr 4, 2025

cmdr2 deleted the fix-arm-neon-msvc-fp16 branch April 11, 2025 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

Uh oh!

cmdr2 commented Apr 4, 2025 •

edited

Loading

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025 •

edited

Loading

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

Uh oh!

	static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) {
	ggml_fp16_t res;
	ggml_fp16_internal_t tmp = f;
	memcpy(&res, &tmp, sizeof(ggml_fp16_t));
	return res;

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

Uh oh!

Conversation

cmdr2 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

ggerganov commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

cmdr2 commented Apr 4, 2025

Uh oh!

Uh oh!

cmdr2 commented Apr 4, 2025 •

edited

Loading

cmdr2 commented Apr 4, 2025 •

edited

Loading