Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Constant hoisting results in suboptimal top bit clearing #111323

Closed
dzaima opened this issue Oct 7, 2024 · 2 comments
Closed

[X86] Constant hoisting results in suboptimal top bit clearing #111323

dzaima opened this issue Oct 7, 2024 · 2 comments

Comments

@dzaima
Copy link

dzaima commented Oct 7, 2024

The code:

#include<stdint.h>
void foo(uint64_t* use, uint64_t x, uint64_t y) {
    use[0] = x & 0xffffffffffff;
    if (y==0) return;
    use[1] = y & 0xffffffffffff;
}

emits with -O3 -march=haswell:

foo:
        mov     al, 48
        bzhi    rax, rsi, rax
        mov     qword ptr [rdi], rax
        test    rdx, rdx
        je      .LBB0_2
        movabs  rax, 281474976710655
        and     rdx, rax
        mov     qword ptr [rdi + 8], rdx
.LBB0_2:
        ret

ending up using both bzhi and movabs+and, which ends up worse than consistently picking either approach.

https://godbolt.org/z/c3jfc3Eoe

final LLVM IR being:

define dso_local void @foo(ptr nocapture noundef writeonly %use, i64 noundef %x, i64 noundef %y) local_unnamed_addr {
entry:
  %const = bitcast i64 281474976710655 to i64
  %and = and i64 %x, %const
  store i64 %and, ptr %use, align 8
  %cmp = icmp eq i64 %y, 0
  br i1 %cmp, label %return, label %if.end

if.end:                                           ; preds = %entry
  %and1 = and i64 %y, %const
  %arrayidx2 = getelementptr inbounds i8, ptr %use, i64 8
  store i64 %and1, ptr %arrayidx2, align 8
  br label %return

return:                                           ; preds = %entry, %if.end
  ret void
}

(another issue where constant hoisting similarly messes with constant-dependent backend optimizations)

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

@llvm/issue-subscribers-backend-x86

Author: dzaima (dzaima)

The code:
#include&lt;stdint.h&gt;
void foo(uint64_t* use, uint64_t x, uint64_t y) {
    use[0] = x &amp; 0xffffffffffff;
    if (y==0) return;
    use[1] = y &amp; 0xffffffffffff;
}

emits with -O3 -march=haswell:

foo:
        mov     al, 48
        bzhi    rax, rsi, rax
        mov     qword ptr [rdi], rax
        test    rdx, rdx
        je      .LBB0_2
        movabs  rax, 281474976710655
        and     rdx, rax
        mov     qword ptr [rdi + 8], rdx
.LBB0_2:
        ret

ending up using both bzhi and movabs+and, which ends up worse than consistently picking either approach.

https://godbolt.org/z/c3jfc3Eoe

final LLVM IR being:

define dso_local void @<!-- -->foo(ptr nocapture noundef writeonly %use, i64 noundef %x, i64 noundef %y) local_unnamed_addr {
entry:
  %const = bitcast i64 281474976710655 to i64
  %and = and i64 %x, %const
  store i64 %and, ptr %use, align 8
  %cmp = icmp eq i64 %y, 0
  br i1 %cmp, label %return, label %if.end

if.end:                                           ; preds = %entry
  %and1 = and i64 %y, %const
  %arrayidx2 = getelementptr inbounds i8, ptr %use, i64 8
  store i64 %and1, ptr %arrayidx2, align 8
  br label %return

return:                                           ; preds = %entry, %if.end
  ret void
}

(another issue where constant hoisting similarly messes with constant-dependent backend optimizations)

@RKSimon RKSimon self-assigned this Oct 7, 2024
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 7, 2024

BMI1-only targets fail to use BEXTR as well

RKSimon added a commit that referenced this issue Oct 7, 2024
Kyvangka1610 added a commit to Kyvangka1610/llvm-project that referenced this issue Oct 7, 2024
* commit 'FETCH_HEAD':
  [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC.
  [X86] Add test coverage for llvm#111323
  [Driver] Use empty multilib file in another test (llvm#111352)
  [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337)
  [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289)
  [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267)
  [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED)
  [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988)
  [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs
  [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133)
  [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED)
  Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123)
  Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat."
  update_test_checks: fix a simple regression  (llvm#111347)
  [LegalizeVectorTypes] Always widen fabs (llvm#111298)
  [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia
  [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121)
  [bazel] port 9144fed
  [SystemZ] Remove inlining threshold multiplier. (llvm#106058)
  [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297)
  [clang][Driver] Improve multilib custom error reporting (llvm#110804)
  [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804)
  [LLVM][Maintainers] Update release managers (llvm#111164)
  [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640)
  [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132)
  [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636)
  [ELF] Change Ctx::target to unique_ptr (llvm#111260)
  [ELF] Pass Ctx & to some free functions
  [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308)
  [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277)
  [ELF] Pass Ctx & to InputFile
  [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241)
  [AMDGPU] Support preloading hidden kernel arguments (llvm#98861)
  [ELF] Move static nextGroupId isInGroup to LinkerDriver
  [clangd] Add ArgumentLists config option under Completion (llvm#111322)
  [ELF] Pass Ctx & to SyntheticSections
  [ELF] Pass Ctx & to Symbols
  [ELF] Pass Ctx & to Symbols
  [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC
  [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282)
  [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431)
  [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309)
  [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805)
  [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279)
  [ELF] Pass Ctx & to InputFiles
  [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288)

Signed-off-by: kyvangka1610 <[email protected]>
@RKSimon RKSimon closed this as completed in 8b6e1dc Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants