Fix simde_math_fpclass functions so they work. #1277

Syonyk · 2025-02-06T20:24:59Z

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended. The 32-bit version had a wrong constant for exponent mask as well.

I don't know where to put tests for things like this... here's my test code.

#include <stdio.h>
#include <stdlib.h>


#include "simde/simde/simde-common.h"
#include "simde/simde/simde-math.h"



void main()
{
	simde_float64 sNaNf64 = simde_uint64_as_float64(0x7FF0000000000001);
	simde_float64 denormalf64 = simde_uint64_as_float64(0x0000000000000001);

	simde_float32 sNaNf32 = simde_uint32_as_float32(0x7F800001);
	simde_float32 denormalf32 = simde_uint32_as_float32(0x00000001);

	if (simde_math_fpclassf(SIMDE_MATH_NAN, 0xff) != SIMDE_MATH_FP_QNAN)
	{
		printf("32QNAN test failed\n");
	}

	if (simde_math_fpclassf(SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_PZERO)
	{
		printf("32PZERO test failed\n");
	}

	if (simde_math_fpclassf(-SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_NZERO)
	{
		printf("32NZERO test failed\n");
	}

	if (simde_math_fpclassf(SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_PINF)
	{
		printf("32PINF test failed, %f\n", SIMDE_MATH_INFINITY);
	}

	if (simde_math_fpclassf(-SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_NINF)
	{
		printf("32NINF test failed\n");
	}

	if (simde_math_fpclassf(denormalf32, 0xff) != SIMDE_MATH_FP_DENORMAL)
	{
		printf("32DENORMAL test failed\n");
	}

	if (simde_math_fpclassf(-SIMDE_FLOAT64_C(42.23), 0xff) != SIMDE_MATH_FP_NEGATIVE)
	{
		printf("32NEGATIVE test failed\n");
	}

	if (simde_math_fpclassf(sNaNf32, 0xff) != SIMDE_MATH_FP_SNAN)
	{
		printf("32SNAN test failed\n");
	}

	// Test that it doesn't report if we tell it not to...
	if (simde_math_fpclassf(sNaNf32, ~SIMDE_MATH_FP_SNAN) == SIMDE_MATH_FP_SNAN)
	{
		printf("32SNAN test failed\n");
	}


	if (simde_math_fpclass(SIMDE_MATH_NAN, 0xff) != SIMDE_MATH_FP_QNAN)
	{
		printf("64QNAN test failed\n");
	}

	if (simde_math_fpclass(SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_PZERO)
	{
		printf("64PZERO test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_NZERO)
	{
		printf("64NZERO test failed\n");
	}

	if (simde_math_fpclass(SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_PINF)
	{
		printf("64PINF test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_NINF)
	{
		printf("64NINF test failed\n");
	}

	if (simde_math_fpclass(denormalf64, 0xff) != SIMDE_MATH_FP_DENORMAL)
	{
		printf("64PINF test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_FLOAT64_C(42.23), 0xff) != SIMDE_MATH_FP_NEGATIVE)
	{
		printf("64NEGATIVE test failed\n");
	}

	if (simde_math_fpclass(sNaNf64, 0xff) != SIMDE_MATH_FP_SNAN)
	{
		printf("64SNAN test failed\n");
	}

	if (simde_math_fpclassf(sNaNf64, ~SIMDE_MATH_FP_SNAN) == SIMDE_MATH_FP_SNAN)
	{
		printf("64SNAN test failed\n");
	}
}

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended. The 32-bit version had a wrong constant for exponent mask as well.

mr-c · 2025-02-06T21:06:34Z

Thank you, @Syonyk ! Yeah, this was my code. Mea culpa.

This function is used in

simde/simde/x86/avx512/fpclass.h

Line 27 in 1995014

#if !defined(SIMDE_X86_AVX512_FPCLASS_H)

Can you think of corresponding test cases that could be added to trigger your fixes?

https://github.com/simd-everywhere/simde/blob/1995014c5990566f41cf78daa2f281b20898a6d6/test/x86/avx512/fpclass.c

If there is an issue here, then the float16 version I wrote is probably also suspect: https://github.com/simd-everywhere/simde/blame/1995014c5990566f41cf78daa2f281b20898a6d6/simde/simde-f16.h

SGSSGene · 2025-02-09T13:30:51Z

Wow that is embarrassing! @Syonyk , thank you for catching this.

I think we shifted the wrong value. Instead of e.g. : (imm8 >> 2) & Nzero_res it should have been imm8 & ( Nzero_res << 2)

Here is a full example of it: https://godbolt.org/z/r61c4x3sc
It produces slightly more compact code than using ?:....but is it correct? I am not sure.

@mr-c why didn't we catch it in our unittests?

mr-c · 2025-02-10T07:51:01Z

@mr-c why didn't we catch it in our unittests?

That is a good question. I see that neither Augustus nor Agrippina have AVX512FP16, and the GCC version installed there (12.x) is too old to know about _mm512_fpclass_pd_mask and _mm512_fpclass_ph_mask apparently.

And the tests are not in the normal style; we don't read in a matrix of numbers but do a run-time comparison. So it is likely that the logic error was repeated. I think it would be better if we can find a machine with a new-enough compiler and CPU and re-generate the tests, but write the results out using unsigned ints so we can verify the bit patterns.

simde/test/x86/avx512/fpclass.c

Lines 35 to 49 in 1995014

    
           simde_float32 sNaNf = simde_uint32_as_float32(0xff800011), 
        
                         denormalf = simde_uint32_as_float32(0x007fffff); 
        
           simde__m256 a = simde_mm256_set_ps( 
        
               SIMDE_MATH_NANF        , SIMDE_FLOAT32_C(0.0), 
        
               -SIMDE_FLOAT32_C(0.0)  , SIMDE_MATH_INFINITYF, 
        
               -SIMDE_MATH_INFINITYF  , denormalf, 
        
               -SIMDE_FLOAT32_C(42.23), sNaNf); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x01), 128); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x02), 64); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x04), 32); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x08), 16); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x10), 8); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x20), 4); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x40), 2); 
        
           simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x80), 1);

SGSSGene · 2025-02-10T21:07:17Z

_mm512_fpclass_ph_mask

For gcc14 you want to check the folder /groupo/ag_abi/software/bin @eseiler is managing uptodate compiler tools there.

Instruction _mm512_fpclass_ph_mask requires avx512fp16 non of our servers/computers have it,
but _mm512_fpclass_ps_mask and _mm512_fpclass_pd_mask requires avx512dq which our server do have.

mr-c · 2025-02-11T16:35:34Z

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended.

Hey @Syonyk ; @SGSSGene and I looked at this together today. simde_math_fpclassf(float v, const int imm8) and friends are supposed to return a boolean 1 if any of the number features specified in the imm8 input are present, otherwise a 0. This is unlike the fpclassify(float) function from C99/C++11 which don't take a mask value. Also unlike those standard functions, the simde_math_fpclass* functions report more categories, as they exist only to implement the _mm{256,512,}{_mask}_fpclass_{pd,ph,ps,sd,sh}_mask functions.

If you have a need for a simde_math_fpclassify(float) style function in SIMDe which returned the full bitmask of discovered numerical types for a particular float value, then that could be added.

It does look like I made an error in the ExpMask, thank you for finding that! I will be adding some documentation and more/better test cases soon.

Syonyk · 2025-02-11T16:46:51Z

Ah, okay. They indeed do that, if that's the intended behavior. I'd assumed imm8 was just gating what was checked for performance reasons (as they're inline functions, so a compile time constant should allow some optimization), since any non-zero value in the result will evaluate as true (0x40 is just as true as 0x1, in "if" statements).

Sorry, feel free to close, I'll rework my code to use them as intended...

It's also worth mentioning that the negative check should probably just check the top bit. You can have entirely valid -0, -nan, -inf, etc.

mr-c · 2025-02-11T17:02:43Z

It's also worth mentioning that the negative check should probably just check the top bit. You can have entirely valid -0, -nan, -inf, etc.

Here's the psuedocode for CheckFPClassDP() from the Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 (page 2332 of https://cdrdv2.intel.com/v1/dl/getContent/671200 a.k.a https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html), which is what was used to implement simde_math_fpclass(); I think the key line is FinNeg_res := NegNum AND NOT(ExpAllOnes) AND NOT(ZeroNumber); // -finite

/* Start checking the source operand for special type *//
NegNum := tsrc[63];
IF (tsrc [62:52]=07FFh) Then ExpAllOnes := 1; FI;
IF (tsrc [62:52]=0h) Then ExpAllZeros := 1;
IF (ExpAllZeros AND MXCSR.DAZ) Then
    MantAllZeros := 1;
ELSIF ( tsrc[51:0]=0h) Then
    MantAllZeros := 1;
FI;
ZeroNumber := ExpAllZeros AND MantAllZeros
SignalingBit := tsrc[51];
sNaN_res := ExpAllOnes AND NOT(MantAllZeros) AND NOT(SignalingBit); // sNaN
qNaN_res := ExpAllOnes AND NOT(MantAllZeros) AND SignalingBit; // qNaN
Pzero_res := NOT(NegNum) AND ExpAllZeros AND MantAllZeros; // +0
Nzero_res := NegNum AND ExpAllZeros AND MantAllZeros; // -0
PInf_res := NOT(NegNum) AND ExpAllOnes AND MantAllZeros; // +Inf
NInf_res := NegNum AND ExpAllOnes AND MantAllZeros; // -Inf
Denorm_res := ExpAllZeros AND NOT(MantAllZeros); // denorm
FinNeg_res := NegNum AND NOT(ExpAllOnes) AND NOT(ZeroNumber); // -finite
bResult = ( imm8[0] AND qNaN_res ) OR (imm8[1] AND Pzero_res ) OR
          ( imm8[2] AND Nzero_res ) OR ( imm8[3] AND PInf_res ) OR
          ( imm8[4] AND NInf_res ) OR ( imm8[5] AND Denorm_res ) OR
          ( imm8[6] AND FinNeg_res ) OR ( imm8[7] AND sNaN_res );
Return bResult;
} //* end of CheckFPClassDP() *//

Syonyk · 2025-02-11T17:31:53Z

Ah, that it is "negative finite" makes sense. I'm over in ARM spaces dealing with a different set of problems. I didn't realize this was fully mapping onto some x86 functions.

mr-c · 2025-02-11T17:33:01Z

Ah, that it is "negative finite" makes sense. I'm over in ARM spaces dealing with a different set of problems. I didn't realize this was fully mapping onto some x86 functions.

Yeah, hence the need for me to add some doc strings to explain better

SGSSGene · 2025-02-12T10:23:49Z

Slightly random suggestion, for "unrelated" code.
We can write the implementation of

simde/simde/x86/avx512/fpclass.h

Line 38 in ef361ba

simde_mm256_fpclass_ps_mask(simde__m256 a, int imm8)

differently:

  for (size_t i = 0 ; i < (sizeof(a_.f32) / sizeof(a_.f32[0])) ; i++) {
    r |= simde_math_fpclassf(a_.f32[i], imm8) ? (UINT8_C(1) << i) : 0;
  }

If we are guaranteed that simde_math_fpclassf collapses down to a single bit, we could also write:

  for (size_t i = 0 ; i < (sizeof(a_.f32) / sizeof(a_.f32[0])) ; i++) {
    r |= (simde_math_fpclassf(a_.f32[i], imm8) << i);
  }

Fix simde_math_fpclass functions so they work.

30e64b5

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended. The 32-bit version had a wrong constant for exponent mask as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix simde_math_fpclass functions so they work. #1277

Fix simde_math_fpclass functions so they work. #1277

Syonyk commented Feb 6, 2025

mr-c commented Feb 6, 2025

SGSSGene commented Feb 9, 2025

mr-c commented Feb 10, 2025

SGSSGene commented Feb 10, 2025

mr-c commented Feb 11, 2025 •

edited

Loading

Syonyk commented Feb 11, 2025

mr-c commented Feb 11, 2025

Syonyk commented Feb 11, 2025

mr-c commented Feb 11, 2025

SGSSGene commented Feb 12, 2025

Fix simde_math_fpclass functions so they work. #1277

Are you sure you want to change the base?

Fix simde_math_fpclass functions so they work. #1277

Conversation

Syonyk commented Feb 6, 2025

mr-c commented Feb 6, 2025

SGSSGene commented Feb 9, 2025

mr-c commented Feb 10, 2025

SGSSGene commented Feb 10, 2025

mr-c commented Feb 11, 2025 • edited Loading

Syonyk commented Feb 11, 2025

mr-c commented Feb 11, 2025

Syonyk commented Feb 11, 2025

mr-c commented Feb 11, 2025

SGSSGene commented Feb 12, 2025

mr-c commented Feb 11, 2025 •

edited

Loading