-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: Add fp16 type support to unary op kernels #12788
base: master
Are you sure you want to change the base?
Conversation
I find there is no UT cases for FP16 be opened. |
I think there are UT cases that are present in test backend ops which was disabled at that time by me (in #12201 ) |
OK, I suggest enabling them and use them test this PR. |
Already enabled and tested. |
It seems that in actual inference of a fp16 model(gemma 2 2B F16 in this case), the intermediate hidden embeddings are converted to fp32:
So, there is no way to test the numerical stability of the fp16 operations with the exception of test-backend-ops:
I am marking this PR "ready for review" for now to get some comments from others. |
ggml/src/ggml-sycl/element_wise.cpp
Outdated
const sycl::nd_item<3> &item_ct1) { | ||
const int i = item_ct1.get_local_range(2) * item_ct1.get_group(2) + | ||
item_ct1.get_local_id(2); | ||
|
||
if (i >= k) { | ||
return; | ||
} | ||
dst[i] = x[i] / (1.0f + sycl::native::exp(-x[i])); | ||
dst[i] = x[i] / (to_T<T>(1.0f) + sycl::native::exp(-x[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl::exp
over sycl::native::exp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why? Please note that I am not the author of the original code.
ggml/src/ggml-sycl/element_wise.cpp
Outdated
const sycl::nd_item<3> &item_ct1) { | ||
const int i = item_ct1.get_local_range(2) * item_ct1.get_group(2) + | ||
item_ct1.get_local_id(2); | ||
|
||
if (i >= k) { | ||
return; | ||
} | ||
dst[i] = 1.0f / (1.0f + sycl::native::exp(-x[i])); | ||
dst[i] = 1.0f / (to_T<T>(1.0f) + sycl::native::exp(-x[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl::exp
over sycl::native::exp
?
ggml/src/ggml-sycl/element_wise.cpp
Outdated
const sycl::nd_item<3> &item_ct1) { | ||
const int i = item_ct1.get_local_range(2) * item_ct1.get_group(2) + | ||
item_ct1.get_local_id(2); | ||
|
||
if (i >= k) { | ||
return; | ||
} | ||
dst[i] = sycl::fmin(1.0f, sycl::fmax(0.0f, (x[i] + 3.0f) / 6.0f)); | ||
dst[i] = sycl::fmin(to_T<T>(1.0f), sycl::fmax(to_T<T>(0.0f), (x[i] + to_T<T>(3.0f)) / to_T<T>(6.0f))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't hard sigmoid max(0, min(1, ((x + 1) / 2 ))
?
see: https://arxiv.org/pdf/1511.00363v3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not the original author, but this is correct too I think.
Here we are doing
hard_sigmoid(x) = clip(((x+3)/6), 0, 1).
There are probably better ways to do this.
Need to disable fp16 support on devices which does not support fp16 in hardware.
Either we do this by checking if the build is compiled with GGML_SYCL_F16 compile flag and disable it in
device_supports_op
function or we add info about current hardware features and check using a function.Need proper testing..