You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of glm_vec4_normalize uses _mm_rsqrt_ps. While its relative error (<= 1.5*2^-12) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.
For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing glm::dmat with glm::mat and reducing both T const epsilon = ... values to 1E-3:
On instruction sets that support it, I see little issue w/ replacing _mm_rsqrt_ps with _mm_div_ps+_mm_sqrt_ps (or vdivq_f32+vsqrtq_f32 on A64). Newton iterations are not much of a win as they once were on modern processors.
The text was updated successfully, but these errors were encountered:
Would you say @gottfriedleibniz that limiting the _mm_rsqrt_ps implementation to a lowp version good enough?
And use the _mm_div_ps+_mm_sqrt_ps implemenation for mediump and highp.
The current implementation of glm_vec4_normalize uses
_mm_rsqrt_ps
. While its relative error (<= 1.5*2^-12
) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing
glm::dmat
withglm::mat
and reducing bothT const epsilon = ...
values to1E-3
:On instruction sets that support it, I see little issue w/ replacing
_mm_rsqrt_ps
with_mm_div_ps+_mm_sqrt_ps
(orvdivq_f32+vsqrtq_f32
on A64). Newton iterations are not much of a win as they once were on modern processors.The text was updated successfully, but these errors were encountered: