Fix incorrect UQRSHL implementation. #1272
Open
+3,958
−283
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UQRSHL was incorrect in several ways, and did not match hardware. Shift checks for saturating were not modified from the signed version, leading to incorrect behavior at (untested) edge cases.
Further, for the 32-bit and 64-bit versions, the rounding math was incorrect with a -32 or -64 shift, and did not match hardware. This has been special cased to do the proper thing at the edge case. The core issue is overflowing the stock 32-bit or 64-bit types used at the limit.
Tests have been modified with substantially increased test cases to properly exercise these changes. The new tests should cause failures in the unsigned cases without the modified code, and should pass with the changes. Additional signed tests were added to handle the edge cases, though no incorrect behavior was observed on those.
Test cases generated on a Google C4a ARMv8.4 machine.