-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Constant hoisting results in suboptimal top bit clearing #111323
Labels
Comments
@llvm/issue-subscribers-backend-x86 Author: dzaima (dzaima)
The code:
#include<stdint.h>
void foo(uint64_t* use, uint64_t x, uint64_t y) {
use[0] = x & 0xffffffffffff;
if (y==0) return;
use[1] = y & 0xffffffffffff;
} emits with foo:
mov al, 48
bzhi rax, rsi, rax
mov qword ptr [rdi], rax
test rdx, rdx
je .LBB0_2
movabs rax, 281474976710655
and rdx, rax
mov qword ptr [rdi + 8], rdx
.LBB0_2:
ret ending up using both https://godbolt.org/z/c3jfc3Eoe final LLVM IR being: define dso_local void @<!-- -->foo(ptr nocapture noundef writeonly %use, i64 noundef %x, i64 noundef %y) local_unnamed_addr {
entry:
%const = bitcast i64 281474976710655 to i64
%and = and i64 %x, %const
store i64 %and, ptr %use, align 8
%cmp = icmp eq i64 %y, 0
br i1 %cmp, label %return, label %if.end
if.end: ; preds = %entry
%and1 = and i64 %y, %const
%arrayidx2 = getelementptr inbounds i8, ptr %use, i64 8
store i64 %and1, ptr %arrayidx2, align 8
br label %return
return: ; preds = %entry, %if.end
ret void
} (another issue where constant hoisting similarly messes with constant-dependent backend optimizations) |
BMI1-only targets fail to use BEXTR as well |
RKSimon
added a commit
that referenced
this issue
Oct 7, 2024
Kyvangka1610
added a commit
to Kyvangka1610/llvm-project
that referenced
this issue
Oct 7, 2024
* commit 'FETCH_HEAD': [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC. [X86] Add test coverage for llvm#111323 [Driver] Use empty multilib file in another test (llvm#111352) [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337) [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289) [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267) [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED) [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988) [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133) [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED) Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123) Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat." update_test_checks: fix a simple regression (llvm#111347) [LegalizeVectorTypes] Always widen fabs (llvm#111298) [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121) [bazel] port 9144fed [SystemZ] Remove inlining threshold multiplier. (llvm#106058) [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297) [clang][Driver] Improve multilib custom error reporting (llvm#110804) [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804) [LLVM][Maintainers] Update release managers (llvm#111164) [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640) [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132) [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636) [ELF] Change Ctx::target to unique_ptr (llvm#111260) [ELF] Pass Ctx & to some free functions [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308) [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277) [ELF] Pass Ctx & to InputFile [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241) [AMDGPU] Support preloading hidden kernel arguments (llvm#98861) [ELF] Move static nextGroupId isInGroup to LinkerDriver [clangd] Add ArgumentLists config option under Completion (llvm#111322) [ELF] Pass Ctx & to SyntheticSections [ELF] Pass Ctx & to Symbols [ELF] Pass Ctx & to Symbols [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282) [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431) [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309) [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805) [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279) [ELF] Pass Ctx & to InputFiles [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288) Signed-off-by: kyvangka1610 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The code:
emits with
-O3 -march=haswell
:ending up using both
bzhi
andmovabs
+and
, which ends up worse than consistently picking either approach.https://godbolt.org/z/c3jfc3Eoe
final LLVM IR being:
(another issue where constant hoisting similarly messes with constant-dependent backend optimizations)
The text was updated successfully, but these errors were encountered: