You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).
The C code does include an __asm__ mutating a __m256i (for the purpose of preventing merging simple shuffles into vpermds to reduce register pressure), but that's in a different place and is assigned to ymm7 which does not feature in the problematic excerpt (vmovapd is added within OFENCE_V just to demonstrate this). Nevertheless, replacing OFENCE_V with #define OFENCE_V(X) X gets rid of the problem (perhaps by chance).
The text was updated successfully, but these errors were encountered:
That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).
The C code does include an __asm__ mutating a __m256i (for the purpose of preventing merging simple shuffles into vpermds to reduce register pressure), but that's in a different place and is assigned to ymm7 which does not feature in the problematic excerpt (vmovapd is added within OFENCE_V just to demonstrate this). Nevertheless, replacing OFENCE_V with #define OFENCE_V(X) X gets rid of the problem (perhaps by chance).
Apologies for the unreadable autogenerated C code, but hopefully that doesn't matter much for the issue in question.
The code here generates assembly which contains within it this excerpt:
That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).
The C code does include an
__asm__
mutating a__m256i
(for the purpose of preventing merging simple shuffles intovpermd
s to reduce register pressure), but that's in a different place and is assigned toymm7
which does not feature in the problematic excerpt (vmovapd
is added withinOFENCE_V
just to demonstrate this). Nevertheless, replacingOFENCE_V
with#define OFENCE_V(X) X
gets rid of the problem (perhaps by chance).The text was updated successfully, but these errors were encountered: