Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessarily large constant created from reordering add and shift #123239

Closed
dzaima opened this issue Jan 16, 2025 · 16 comments · Fixed by #126448
Closed

Unnecessarily large constant created from reordering add and shift #123239

dzaima opened this issue Jan 16, 2025 · 16 comments · Fixed by #126448
Assignees
Labels
backend:X86 good first issue https://github.com/llvm/llvm-project/contribute missed-optimization

Comments

@dzaima
Copy link

dzaima commented Jan 16, 2025

https://godbolt.org/z/xoKf6bnTb

The code:

#include<stdint.h>
#include<stdbool.h>
bool foo(uint64_t x) {
  uint16_t tag = x>>48;
  return tag>=0b1111111111110010 && tag<=0b1111111111110100;
}

with -O3 as of clang 19 (and still in trunk) compiles to:

foo:
        movabs  rax, 3940649673949184
        add     rax, rdi
        shr     rax, 48
        cmp     eax, 3
        setb    al
        ret

whereas 18.0 did this, which is strictly better (i.e. is the exact same set of instructions, just in a different order and without movabs):

foo:
        shr     rdi, 48
        add     edi, -65522
        cmp     edi, 3
        setb    al
        ret
@RKSimon
Copy link
Collaborator

RKSimon commented Jan 18, 2025

trunk:

define i1 @foo(i64 %x) {
entry:
  %0 = add i64 %x, 3940649673949184
  %1 = icmp ult i64 %0, 844424930131968
  ret i1 %1
}

vs 17.x:

define i1 @foo(i64 %x) {
entry:
  %shr = lshr i64 %x, 48
  %conv = trunc i64 %shr to i32
  %0 = add nsw i32 %conv, -65522
  %1 = icmp ult i32 %0, 3
  ret i1 %1
}

IIRC this has been reported before

@dzaima
Copy link
Author

dzaima commented Jan 18, 2025

I've previously reported similar (but unrelated to an extent I'd think) #62145 and #111323 in case those might be what you're thinking of; otherwise neither movabs shr cmp nor lshr add icmp trunc turn up anything

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 19, 2025

We lower to:

Optimized legalized selection DAG: %bb.0 'foo:entry'
SelectionDAG has 16 nodes:
  t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i64 = add t2, Constant:i64<3940649673949184>
          t16: i64 = srl t4, Constant:i8<48>
        t22: i32 = truncate t16
      t24: i32,i32 = X86ISD::SUB t22, Constant:i32<3>
    t26: i8 = X86ISD::SETCC TargetConstant:i8<2>, t24:1
  t11: ch,glue = CopyToReg t0, Register:i8 $al, t26
  t12: ch = X86ISD::RET_GLUE t11, TargetConstant:i32<0>, Register:i8 $al, t11:1

It should be possible to fold some (truncate (srl (add X, C1), C2)) patterns to (add (truncate (srl X, C2)), C1') - probably as a x86 specific fold initially.

  • Confirm the alive2 pattern for the constraints on C1 and C2 for the (truncate (srl (add X, C1), C2)) -> (add (truncate (srl X, C2)), C1') fold
  • Add x86 test coverage for some i64 srl+add cases
  • Add x86 fold - either to combineTruncatedArithmetic (we have a TODO suggesting i64 handling) or it might have to be combineX86SetCC/combineSetCCEFLAGS.

@RKSimon RKSimon added good first issue https://github.com/llvm/llvm-project/contribute backend:X86 labels Jan 19, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 19, 2025

@llvm/issue-subscribers-backend-x86

Author: dzaima (dzaima)

https://godbolt.org/z/xoKf6bnTb

The code:

#include&lt;stdint.h&gt;
#include&lt;stdbool.h&gt;
bool foo(uint64_t x) {
  uint16_t tag = x&gt;&gt;48;
  return tag&gt;=0b1111111111110010 &amp;&amp; tag&lt;=0b1111111111110100;
}

with -O3 as of clang 19 (and still in trunk) compiles to:

foo:
        movabs  rax, 3940649673949184
        add     rax, rdi
        shr     rax, 48
        cmp     eax, 3
        setb    al
        ret

whereas 18.0 did this, which is strictly better (i.e. is the exact same set of instructions, just in a different order and without movabs):

foo:
        shr     rdi, 48
        add     edi, -65522
        cmp     edi, 3
        setb    al
        ret

@llvmbot
Copy link
Member

llvmbot commented Jan 19, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Jan 19, 2025

@llvm/issue-subscribers-good-first-issue

Author: dzaima (dzaima)

https://godbolt.org/z/xoKf6bnTb

The code:

#include&lt;stdint.h&gt;
#include&lt;stdbool.h&gt;
bool foo(uint64_t x) {
  uint16_t tag = x&gt;&gt;48;
  return tag&gt;=0b1111111111110010 &amp;&amp; tag&lt;=0b1111111111110100;
}

with -O3 as of clang 19 (and still in trunk) compiles to:

foo:
        movabs  rax, 3940649673949184
        add     rax, rdi
        shr     rax, 48
        cmp     eax, 3
        setb    al
        ret

whereas 18.0 did this, which is strictly better (i.e. is the exact same set of instructions, just in a different order and without movabs):

foo:
        shr     rdi, 48
        add     edi, -65522
        cmp     edi, 3
        setb    al
        ret

@souvik150
Copy link

Hey can I pick this up? Im new to the project and just looking for some beginner friendly issues

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 19, 2025

Sure. Are you familiar with Alive2? The first step is to determine the constraints on the truncation fold.

@souvik150
Copy link

Nope. Let me then start by getting familiar with Alive2. I'll spend today looking at that then, thanks!

@souvik150
Copy link

souvik150 commented Jan 20, 2025

So as I first step I will have to convert the C code into IR using latest clang and version 18 and then compare the IR using alive2? @RKSimon

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 20, 2025

You can use the IR here: #123239 (comment)

@souvik150
Copy link

You can use the IR here: #123239 (comment)

Cool thanks!

@souvik150
Copy link

souvik150 commented Jan 22, 2025

$ALIVE2_HOME/alive2/build/alive-tv old.ll new.ll

define i1 @foo(i64 %x) {
entry:
  %shr = lshr i64 %x, 48
  %conv = trunc i64 %shr to i32
  %#0 = add nsw i32 %conv, 4294901774
  %#1 = icmp ult i32 %#0, 3
  ret i1 %#1
}
=>
define i1 @foo(i64 %x) {
entry:
  %#0 = add i64 %x, 3940649673949184
  %#1 = icmp ult i64 %#0, 844424930131968
  ret i1 %#1
}
Transformation seems to be correct!

Summary:
  1 correct transformations
  0 incorrect transformations
  0 failed-to-prove transformations
  0 Alive2 errors

Hey @RKSimon this is the findings I got. Sorry for the delayed response.

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 29, 2025

@souvik150 instcombine will fold to the canonical i64 IR - which is what alive2 has done for you above.

https://alive2.llvm.org/ce/z/hU6D28 shows the reverse fold - but what we need to do is work out what bounds the addition and comparison constants need for us to perform the truncated add + comparison in a general case.

@joaotgouveia
Copy link
Contributor

Hi. Is it ok for me to pick this up, or is it still being worked on? @RKSimon @souvik150

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 10, 2025

ping @souvik150

phoebewang pushed a commit that referenced this issue Feb 21, 2025
…uncate (srl X, C2)), C1') (#126448)

Addresses the poor codegen identified in #123239 and a few extra cases.
This transformation is correct for `eq`
(https://alive2.llvm.org/ce/z/qZhwtT), `ne`
(https://alive2.llvm.org/ce/z/6gsmNz), `ult`
(https://alive2.llvm.org/ce/z/xip_td) and `ugt`
(https://alive2.llvm.org/ce/z/39XQkX).

Fixes #123239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 good first issue https://github.com/llvm/llvm-project/contribute missed-optimization
Projects
None yet
7 participants