Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix simde_math_fpclass functions so they work. #1277

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Syonyk
Copy link
Contributor

@Syonyk Syonyk commented Feb 6, 2025

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended. The 32-bit version had a wrong constant for exponent mask as well.

I don't know where to put tests for things like this... here's my test code.

#include <stdio.h>
#include <stdlib.h>


#include "simde/simde/simde-common.h"
#include "simde/simde/simde-math.h"



void main()
{
	simde_float64 sNaNf64 = simde_uint64_as_float64(0x7FF0000000000001);
	simde_float64 denormalf64 = simde_uint64_as_float64(0x0000000000000001);

	simde_float32 sNaNf32 = simde_uint32_as_float32(0x7F800001);
	simde_float32 denormalf32 = simde_uint32_as_float32(0x00000001);

	if (simde_math_fpclassf(SIMDE_MATH_NAN, 0xff) != SIMDE_MATH_FP_QNAN)
	{
		printf("32QNAN test failed\n");
	}

	if (simde_math_fpclassf(SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_PZERO)
	{
		printf("32PZERO test failed\n");
	}

	if (simde_math_fpclassf(-SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_NZERO)
	{
		printf("32NZERO test failed\n");
	}

	if (simde_math_fpclassf(SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_PINF)
	{
		printf("32PINF test failed, %f\n", SIMDE_MATH_INFINITY);
	}

	if (simde_math_fpclassf(-SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_NINF)
	{
		printf("32NINF test failed\n");
	}

	if (simde_math_fpclassf(denormalf32, 0xff) != SIMDE_MATH_FP_DENORMAL)
	{
		printf("32DENORMAL test failed\n");
	}

	if (simde_math_fpclassf(-SIMDE_FLOAT64_C(42.23), 0xff) != SIMDE_MATH_FP_NEGATIVE)
	{
		printf("32NEGATIVE test failed\n");
	}

	if (simde_math_fpclassf(sNaNf32, 0xff) != SIMDE_MATH_FP_SNAN)
	{
		printf("32SNAN test failed\n");
	}

	// Test that it doesn't report if we tell it not to...
	if (simde_math_fpclassf(sNaNf32, ~SIMDE_MATH_FP_SNAN) == SIMDE_MATH_FP_SNAN)
	{
		printf("32SNAN test failed\n");
	}


	if (simde_math_fpclass(SIMDE_MATH_NAN, 0xff) != SIMDE_MATH_FP_QNAN)
	{
		printf("64QNAN test failed\n");
	}

	if (simde_math_fpclass(SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_PZERO)
	{
		printf("64PZERO test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_FLOAT32_C(0.0), 0xff) != SIMDE_MATH_FP_NZERO)
	{
		printf("64NZERO test failed\n");
	}

	if (simde_math_fpclass(SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_PINF)
	{
		printf("64PINF test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_MATH_INFINITY, 0xff) != SIMDE_MATH_FP_NINF)
	{
		printf("64NINF test failed\n");
	}

	if (simde_math_fpclass(denormalf64, 0xff) != SIMDE_MATH_FP_DENORMAL)
	{
		printf("64PINF test failed\n");
	}

	if (simde_math_fpclass(-SIMDE_FLOAT64_C(42.23), 0xff) != SIMDE_MATH_FP_NEGATIVE)
	{
		printf("64NEGATIVE test failed\n");
	}

	if (simde_math_fpclass(sNaNf64, 0xff) != SIMDE_MATH_FP_SNAN)
	{
		printf("64SNAN test failed\n");
	}

	if (simde_math_fpclassf(sNaNf64, ~SIMDE_MATH_FP_SNAN) == SIMDE_MATH_FP_SNAN)
	{
		printf("64SNAN test failed\n");
	}
}

Functions were simply mashing all results into the low bit, instead of
spreading them out across the 8 bit result as intended.  The 32-bit
version had a wrong constant for exponent mask as well.
@mr-c
Copy link
Collaborator

mr-c commented Feb 6, 2025

Thank you, @Syonyk ! Yeah, this was my code. Mea culpa.

This function is used in

#if !defined(SIMDE_X86_AVX512_FPCLASS_H)

Can you think of corresponding test cases that could be added to trigger your fixes?

https://github.com/simd-everywhere/simde/blob/1995014c5990566f41cf78daa2f281b20898a6d6/test/x86/avx512/fpclass.c

If there is an issue here, then the float16 version I wrote is probably also suspect: https://github.com/simd-everywhere/simde/blame/1995014c5990566f41cf78daa2f281b20898a6d6/simde/simde-f16.h

@SGSSGene
Copy link
Contributor

SGSSGene commented Feb 9, 2025

Wow that is embarrassing! @Syonyk , thank you for catching this.

I think we shifted the wrong value. Instead of e.g. : (imm8 >> 2) & Nzero_res it should have been imm8 & ( Nzero_res << 2)

Here is a full example of it: https://godbolt.org/z/r61c4x3sc
It produces slightly more compact code than using ?:....but is it correct? I am not sure.

@mr-c why didn't we catch it in our unittests?

@mr-c
Copy link
Collaborator

mr-c commented Feb 10, 2025

@mr-c why didn't we catch it in our unittests?

That is a good question. I see that neither Augustus nor Agrippina have AVX512FP16, and the GCC version installed there (12.x) is too old to know about _mm512_fpclass_pd_mask and _mm512_fpclass_ph_mask apparently.

And the tests are not in the normal style; we don't read in a matrix of numbers but do a run-time comparison. So it is likely that the logic error was repeated. I think it would be better if we can find a machine with a new-enough compiler and CPU and re-generate the tests, but write the results out using unsigned ints so we can verify the bit patterns.

simde_float32 sNaNf = simde_uint32_as_float32(0xff800011),
denormalf = simde_uint32_as_float32(0x007fffff);
simde__m256 a = simde_mm256_set_ps(
SIMDE_MATH_NANF , SIMDE_FLOAT32_C(0.0),
-SIMDE_FLOAT32_C(0.0) , SIMDE_MATH_INFINITYF,
-SIMDE_MATH_INFINITYF , denormalf,
-SIMDE_FLOAT32_C(42.23), sNaNf);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x01), 128);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x02), 64);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x04), 32);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x08), 16);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x10), 8);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x20), 4);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x40), 2);
simde_assert_equal_mmask8(simde_mm256_fpclass_ps_mask(a, 0x80), 1);

@SGSSGene
Copy link
Contributor

_mm512_fpclass_ph_mask

For gcc14 you want to check the folder /groupo/ag_abi/software/bin @eseiler is managing uptodate compiler tools there.

Instruction _mm512_fpclass_ph_mask requires avx512fp16 non of our servers/computers have it,
but _mm512_fpclass_ps_mask and _mm512_fpclass_pd_mask requires avx512dq which our server do have.

@mr-c
Copy link
Collaborator

mr-c commented Feb 11, 2025

Functions were simply mashing all results into the low bit, instead of spreading them out across the 8 bit result as intended.

Hey @Syonyk ; @SGSSGene and I looked at this together today. simde_math_fpclassf(float v, const int imm8) and friends are supposed to return a boolean 1 if any of the number features specified in the imm8 input are present, otherwise a 0. This is unlike the fpclassify(float) function from C99/C++11 which don't take a mask value. Also unlike those standard functions, the simde_math_fpclass* functions report more categories, as they exist only to implement the _mm{256,512,}{_mask}_fpclass_{pd,ph,ps,sd,sh}_mask functions.

If you have a need for a simde_math_fpclassify(float) style function in SIMDe which returned the full bitmask of discovered numerical types for a particular float value, then that could be added.

It does look like I made an error in the ExpMask, thank you for finding that! I will be adding some documentation and more/better test cases soon.

@Syonyk
Copy link
Contributor Author

Syonyk commented Feb 11, 2025

Ah, okay. They indeed do that, if that's the intended behavior. I'd assumed imm8 was just gating what was checked for performance reasons (as they're inline functions, so a compile time constant should allow some optimization), since any non-zero value in the result will evaluate as true (0x40 is just as true as 0x1, in "if" statements).

Sorry, feel free to close, I'll rework my code to use them as intended...

It's also worth mentioning that the negative check should probably just check the top bit. You can have entirely valid -0, -nan, -inf, etc.

@mr-c
Copy link
Collaborator

mr-c commented Feb 11, 2025

It's also worth mentioning that the negative check should probably just check the top bit. You can have entirely valid -0, -nan, -inf, etc.

Here's the psuedocode for CheckFPClassDP() from the Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 (page 2332 of https://cdrdv2.intel.com/v1/dl/getContent/671200 a.k.a https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html), which is what was used to implement simde_math_fpclass(); I think the key line is FinNeg_res := NegNum AND NOT(ExpAllOnes) AND NOT(ZeroNumber); // -finite

/* Start checking the source operand for special type *//
NegNum := tsrc[63];
IF (tsrc [62:52]=07FFh) Then ExpAllOnes := 1; FI;
IF (tsrc [62:52]=0h) Then ExpAllZeros := 1;
IF (ExpAllZeros AND MXCSR.DAZ) Then
    MantAllZeros := 1;
ELSIF ( tsrc[51:0]=0h) Then
    MantAllZeros := 1;
FI;
ZeroNumber := ExpAllZeros AND MantAllZeros
SignalingBit := tsrc[51];
sNaN_res := ExpAllOnes AND NOT(MantAllZeros) AND NOT(SignalingBit); // sNaN
qNaN_res := ExpAllOnes AND NOT(MantAllZeros) AND SignalingBit; // qNaN
Pzero_res := NOT(NegNum) AND ExpAllZeros AND MantAllZeros; // +0
Nzero_res := NegNum AND ExpAllZeros AND MantAllZeros; // -0
PInf_res := NOT(NegNum) AND ExpAllOnes AND MantAllZeros; // +Inf
NInf_res := NegNum AND ExpAllOnes AND MantAllZeros; // -Inf
Denorm_res := ExpAllZeros AND NOT(MantAllZeros); // denorm
FinNeg_res := NegNum AND NOT(ExpAllOnes) AND NOT(ZeroNumber); // -finite
bResult = ( imm8[0] AND qNaN_res ) OR (imm8[1] AND Pzero_res ) OR
          ( imm8[2] AND Nzero_res ) OR ( imm8[3] AND PInf_res ) OR
          ( imm8[4] AND NInf_res ) OR ( imm8[5] AND Denorm_res ) OR
          ( imm8[6] AND FinNeg_res ) OR ( imm8[7] AND sNaN_res );
Return bResult;
} //* end of CheckFPClassDP() *//

@Syonyk
Copy link
Contributor Author

Syonyk commented Feb 11, 2025

Ah, that it is "negative finite" makes sense. I'm over in ARM spaces dealing with a different set of problems. I didn't realize this was fully mapping onto some x86 functions.

@mr-c
Copy link
Collaborator

mr-c commented Feb 11, 2025

Ah, that it is "negative finite" makes sense. I'm over in ARM spaces dealing with a different set of problems. I didn't realize this was fully mapping onto some x86 functions.

Yeah, hence the need for me to add some doc strings to explain better

@SGSSGene
Copy link
Contributor

Slightly random suggestion, for "unrelated" code.
We can write the implementation of

simde_mm256_fpclass_ps_mask(simde__m256 a, int imm8)
differently:

  for (size_t i = 0 ; i < (sizeof(a_.f32) / sizeof(a_.f32[0])) ; i++) {
    r |= simde_math_fpclassf(a_.f32[i], imm8) ? (UINT8_C(1) << i) : 0;
  }

If we are guaranteed that simde_math_fpclassf collapses down to a single bit, we could also write:

  for (size_t i = 0 ; i < (sizeof(a_.f32) / sizeof(a_.f32[0])) ; i++) {
    r |= (simde_math_fpclassf(a_.f32[i], imm8) << i);
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants