glm_vec4_normalize #1224

gottfriedleibniz · 2024-01-17T14:36:50Z

The current implementation of glm_vec4_normalize uses _mm_rsqrt_ps. While its relative error (<= 1.5*2^-12) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.

For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing glm::dmat with glm::mat and reducing both T const epsilon = ... values to 1E-3:

# Scalar:
└> g++ -I${GLM_PATH} gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
0

# SSE:
└> g++ -I${GLM_PATH} -DGLM_FORCE_SSE2 -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

# Arm64+NEON (different box):
└> g++ -I${GLM_PATH} -DGLM_FORCE_NEON -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

On instruction sets that support it, I see little issue w/ replacing _mm_rsqrt_ps with _mm_div_ps+_mm_sqrt_ps (or vdivq_f32+vsqrtq_f32 on A64). Newton iterations are not much of a win as they once were on modern processors.

The text was updated successfully, but these errors were encountered:

christophe-lunarg · 2024-02-08T22:03:34Z

Would you say @gottfriedleibniz that limiting the _mm_rsqrt_ps implementation to a lowp version good enough?
And use the _mm_div_ps+_mm_sqrt_ps implemenation for mediump and highp.

gottfriedleibniz · 2024-02-09T00:40:15Z

Yeah, that seems fine.

christophe-lunarg added the PR requested label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glm_vec4_normalize #1224

glm_vec4_normalize #1224

gottfriedleibniz commented Jan 17, 2024

christophe-lunarg commented Feb 8, 2024 •

edited

Loading

gottfriedleibniz commented Feb 9, 2024

glm_vec4_normalize #1224

glm_vec4_normalize #1224

Comments

gottfriedleibniz commented Jan 17, 2024

christophe-lunarg commented Feb 8, 2024 • edited Loading

gottfriedleibniz commented Feb 9, 2024

christophe-lunarg commented Feb 8, 2024 •

edited

Loading