-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crossgen2 comparisons are failing in coreclr-outerloop runs #111972
Comments
Tagging subscribers to this area: @hoyosjs |
/cc @dotnet/jit-contrib |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@steveisok is this known to be a codegen issue? Any details? The logs are not very enlightening. |
Yes, it looks like bad non-deterministic codegen from x64-hosted x86-targeting crossgen2 and other similar configuration pairs. Here are the steps to investigate the failure:
Execution failed with a python error for me. I may have too old or too new python on my machine, not sure. In any case, I have ignored the error since it produced the R2R binaries to look at.
Size: 304 bytes
vs. Size: 307 bytes
There are other similar diffs where the emitted instructions is doubled for some reason. |
Thanks Jan, will take a look. |
Interestingly enough I think this may be the issue I fixed in #112020. We were losing assertions depending on whether we were using long or short bit vectors, and the short size depends on the host arch. Should know soon. |
That may have cut down on some of the diffs, but there are still some. Another culprit seems to be coming from this bit of code: runtime/src/coreclr/jit/importercalls.cpp Lines 2990 to 3006 in fa0f65c
We get different address values depending on the host:
and this changes our address mode formation ;; x64 host
IN0044: 0000F3 jae SHORT G_M19140_IG17
recordRelocation: 000002975123C2C6 (rw: 000002975123C2C6) => 400000000052FFB8, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0045: 0000F5 mov edx, (reloc 0x400000000052ffb8)
; byrRegs +[edx]
IN0046: 0000FA movsx ecx, word ptr [edx+2*ebx]
;; x86 host
IN0045: 0000F5 add ebx, ebx
recordRelocation: 2D1B8E48 (rw: 2D1B8E48) => 4052FF88, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0046: 0000F7 mov edx, (reloc 0x4052ff88)
; byrRegs +[edx]
IN0047: 0000FC add edx, ebx
IN0048: 0000FE movsx ecx, word ptr [edx] @EgorBo ring any bells? Seems odd we'd depend on the size of a relocatable handle to pick an address mode. |
hm.. not really, that code hasn't changed since 2022 |
I think the issue may be here: On an x86 host the constant fits in 32 bits and is relocatable so we bail out without forming an address mode. On an x64 host the constant doesn't fit and we skip down further in the method and see I am going to change this to always skip down if |
That resolved some of the diffs, but there are more. Will keep looking. |
We are seeing different PGO data for some value probes, also some parsing or dumping issues with the PGO schema. The edge count data and class histograms are consistent however.
Also the x64 hosted version looks odd, we have duplicated entries in the table. |
For PGO handle histograms the jit host fixes the entries to be host pointer sized, but value histogram entries are always the schema declared size. I've updated the likely value computation to adapt. This fixes a lot more of the diffs, but some still remain, looks like an occasional inlining difference and some dec/sub peepholes. |
Dec/Add thing looks like loop down-counting:
|
Looks like this needs to be runtime/src/coreclr/jit/scev.cpp Line 1330 in e51af40
|
Ok, that cleared up another set of diffs. But there is still more (maybe just one more). Here's one where a different inlining decision is made:
|
runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs Lines 2962 to 2964 in 6f3f675
Since This seems to be happening at the host layer, the jit is not modifying these flag bits. |
This was recent bug fix: https://github.com/dotnet/runtime/pull/111308/files#diff-132a77bcd3f74cf0e0b04fbccda246c97c91e40562d78cb01fff61cf69403573R1120 . Does your x64 host have this fix? |
I am building everything locally so both hosts are built from the same sources. But my crossgen2 does seem to be older bits ... |
Yeah that looks like the problem, I was using an older crossgen2 |
|
Fix a couple of issues that were causing cross-crossgen tests to fail * address mode formation was sensitive to the size of a constant handle * value histogram processing was always using 64 bit value sizes * transformation for down-counted loops was using host-sized -1. Fixes the jit-related issues in #111972
The arm to arm Linux, arm64 to arm64 OSX, and the x86 to x86 Windows comparison legs are failing. This was noticed after the change in #111881 was run to correct infrastructure issues.
Example build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=932682&view=results
arm to arm Linux
The text was updated successfully, but these errors were encountered: