Skip to content

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cmdr2
Copy link
Collaborator

@cmdr2 cmdr2 commented Apr 4, 2025

This PR attempts to fix the CI build failure in ggml-org/llama.cpp#12732

The underlying arm functions actually expect __fp16 *, so this PR attempts to only match the expected function signature. E.g. ARM reference doc for vld1q_f16() and vst1_f16().

It does not affect the actual computation. The computation (before this function is called) will still be performed as uint16_t on MSVC+arm neon. For e.g. the fp32-to-fp16 code will continue to pack the data in a uint16_t -

ggml/src/ggml-impl.h

Lines 334 to 338 in dddef73

static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) {
ggml_fp16_t res;
ggml_fp16_internal_t tmp = f;
memcpy(&res, &tmp, sizeof(ggml_fp16_t));
return res;

And the bitshifting hack from ggml-org/llama.cpp#5404 will continue being used (in ggml-cpu-quants.c) -

#define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) }

I don't think this will affect the runtime performance, since reinterpret_cast is compile-time.

Given that I've never dealt with ARM code before, please let me know if this needs any changes. Thanks! :)

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 4, 2025

The runner on ggml-ci passes, but of course that doesn't run the arm+msvc scenario - 94cbdad

@ggerganov
Copy link
Member

The change looks OK to me - let's see how the CI in llama.cpp goes.

The computation (before this function is called) will still be performed as uint16_t on MSVC+arm neon. For e.g. the fp32-to-fp16 code will continue to pack the data in a uint16_t -

I think the code that you referenced is executed on Arm without MSVC because of the !defined(_MSC_VER) on line 322. Do you agree?

@ggerganov
Copy link
Member

The CI has passed.

However, I took a more detailed look at this code now, and I think there is a simpler solutions: ggml-org/llama.cpp@1b07edf

  • The CUDA checks seem like something that was needed in the past, before declaring ggml_fp16_t to by uint16_t. I don't think these are relevant any more, so we can remove them.
  • The MSVC ggml_vld1q_u32 hack does not actually require typedef uint16_t ggml_fp16_internal_t; as I initially thought, because this instruction works with uint32_t / uint64_t arguments. So I don't see a reason not to always typedef __fp16 ggml_fp16_internal_t;, regardless if MSVC is used or not.

Waiting for the llama.cpp CI to pass for this alternative solution (https://github.com/ggml-org/llama.cpp/actions/runs/14261396774) and if it works, will proceed with it.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 4, 2025

I think the code that you referenced is executed on Arm without MSVC because of the !defined(_MSC_VER) on line 322. Do you agree?

Ah yeah, I missed that. You're right, and I agree with the suggestion, because ggml_fp16_internal_t isn't actually used anywhere else.

In your WIP commit, are you missing a !defined(__MSC_VER) in the #if condition? ggml-org/llama.cpp@1b07edf#diff-1f56ac82eed1293d4aa7c35aef0bc19e831cdb24dcb6af43582143936eb7eae4L488

#endif // defined(__ARM_NEON) && (!defined(__MSC_VER)

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 4, 2025

And if ggml_fp16_internal_t isn't actually used anywhere, why not simplify and use __fp16 directly in the WIP commit?

@ggerganov
Copy link
Member

In your WIP commit, are you missing a !defined(__MSC_VER) in the #if condition?

The MSVC-specific hack for ggml_vld1q_u32 does not seem related to the way the FP16 data is handled, so I don't see a need to distinguish MSVC vs no-MSVC Arm builds anymore in ggml-impl.h.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 4, 2025

Right, but previously the ggml_compute_fp16_to_fp32 function would use the pure reference implementation for MSVC+Neon. But after this change, wouldn't MSVC+Neon also attempt to use this implementation (with __fp16)? Will that work?

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 4, 2025

Closing this in favor of #1177

@cmdr2 cmdr2 closed this Apr 4, 2025
@cmdr2 cmdr2 deleted the fix-arm-neon-msvc-fp16 branch April 11, 2025 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants