Excessive moving of SIMD registers on x86-64 #81391

dzaima · 2024-02-11T01:54:15Z

Apologies for the unreadable autogenerated C code, but hopefully that doesn't matter much for the issue in question.

The code here generates assembly which contains within it this excerpt:

        vmovdqu ymmword ptr [rsp + 112], ymm2   # 32-byte Spill
        vmovdqa ymm2, ymm14
        vmovdqa xmm14, xmm1
        vmovdqa ymm1, ymm9
        vmovdqa ymm9, ymm8
        vmovdqa ymm8, ymm6
        vmovdqa ymm6, ymm5
        vmovdqa ymm5, ymm3
        vmovdqa xmm3, xmm12
        vmovq   xmm12, qword ptr [rbx + rax + 8] # xmm12 = mem[0],zero
        vmovdqa ymm4, ymm15
        vmovdqa ymm15, ymm13
        vmovd   xmm13, edx
        vpinsrd xmm13, xmm13, ecx, 1
        vpmaxsd xmm12, xmm12, xmm13
        vmovq   xmm13, qword ptr [rbx + rax]    # xmm13 = mem[0],zero
        vpbroadcastd    xmm0, dword ptr [rip + .LCPI0_17] # xmm0 = [1,1,1,1]
        vpinsrd xmm0, xmm0, ecx, 0
        vpaddd  xmm0, xmm13, xmm0
        vmovdqa ymm13, ymm15
        vmovdqa ymm15, ymm4
        vpunpcklqdq     xmm0, xmm0, xmm12       # xmm0 = xmm0[0],xmm12[0]
        vmovdqa xmm12, xmm3
        vmovdqa ymm3, ymm5
        vmovdqa ymm5, ymm6
        vmovdqa ymm6, ymm8
        vmovdqa ymm8, ymm9
        vmovdqa ymm9, ymm1
        vmovdqa xmm1, xmm14
        vmovdqa ymm14, ymm2
        vmovdqu ymm2, ymmword ptr [rsp + 112]   # 32-byte Reload

That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).

The C code does include an __asm__ mutating a __m256i (for the purpose of preventing merging simple shuffles into vpermds to reduce register pressure), but that's in a different place and is assigned to ymm7 which does not feature in the problematic excerpt (vmovapd is added within OFENCE_V just to demonstrate this). Nevertheless, replacing OFENCE_V with #define OFENCE_V(X) X gets rid of the problem (perhaps by chance).

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-02-11T01:55:46Z

@llvm/issue-subscribers-backend-x86

Author: dzaima (dzaima)

Apologies for the unreadable autogenerated C code, but hopefully that doesn't matter much for the issue in question.

The code here generates assembly which contains within it this excerpt:

        vmovdqu ymmword ptr [rsp + 112], ymm2   # 32-byte Spill
        vmovdqa ymm2, ymm14
        vmovdqa xmm14, xmm1
        vmovdqa ymm1, ymm9
        vmovdqa ymm9, ymm8
        vmovdqa ymm8, ymm6
        vmovdqa ymm6, ymm5
        vmovdqa ymm5, ymm3
        vmovdqa xmm3, xmm12
        vmovq   xmm12, qword ptr [rbx + rax + 8] # xmm12 = mem[0],zero
        vmovdqa ymm4, ymm15
        vmovdqa ymm15, ymm13
        vmovd   xmm13, edx
        vpinsrd xmm13, xmm13, ecx, 1
        vpmaxsd xmm12, xmm12, xmm13
        vmovq   xmm13, qword ptr [rbx + rax]    # xmm13 = mem[0],zero
        vpbroadcastd    xmm0, dword ptr [rip + .LCPI0_17] # xmm0 = [1,1,1,1]
        vpinsrd xmm0, xmm0, ecx, 0
        vpaddd  xmm0, xmm13, xmm0
        vmovdqa ymm13, ymm15
        vmovdqa ymm15, ymm4
        vpunpcklqdq     xmm0, xmm0, xmm12       # xmm0 = xmm0[0],xmm12[0]
        vmovdqa xmm12, xmm3
        vmovdqa ymm3, ymm5
        vmovdqa ymm5, ymm6
        vmovdqa ymm6, ymm8
        vmovdqa ymm8, ymm9
        vmovdqa ymm9, ymm1
        vmovdqa xmm1, xmm14
        vmovdqa ymm14, ymm2
        vmovdqu ymm2, ymmword ptr [rsp + 112]   # 32-byte Reload

That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).

The C code does include an __asm__ mutating a __m256i (for the purpose of preventing merging simple shuffles into vpermds to reduce register pressure), but that's in a different place and is assigned to ymm7 which does not feature in the problematic excerpt (vmovapd is added within OFENCE_V just to demonstrate this). Nevertheless, replacing OFENCE_V with #define OFENCE_V(X) X gets rid of the problem (perhaps by chance).

github-actions bot added the new issue label Feb 11, 2024

EugeneZelenko added backend:X86 missed-optimization and removed new issue labels Feb 11, 2024

SeeSpring mentioned this issue Feb 27, 2024

Strange register allocation causes slowdown rust-lang/rust#121529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive moving of SIMD registers on x86-64 #81391

Excessive moving of SIMD registers on x86-64 #81391

dzaima commented Feb 11, 2024

llvmbot commented Feb 11, 2024

Excessive moving of SIMD registers on x86-64 #81391

Excessive moving of SIMD registers on x86-64 #81391

Comments

dzaima commented Feb 11, 2024

llvmbot commented Feb 11, 2024