-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Remove 128-bit limit on Vector<T> size for ARM64 #129852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -370,4 +370,7 @@ | |
|
|
||
| #define REG_UNKBASE REG_R19 | ||
| #define RBM_UNKBASE RBM_R19 | ||
|
|
||
| #define MAX_SVE_REGSIZE_BYTES 256 | ||
|
|
||
| // clang-format on | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1174,6 +1174,10 @@ MethodTableBuilder::CopyParentVtable() | |
| } | ||
| } | ||
|
|
||
| #ifdef TARGET_ARM64 | ||
| extern "C" uint64_t GetSveLengthFromOS(); | ||
| #endif | ||
|
|
||
| //******************************************************************************* | ||
| // Determine if this is the special SIMD type System.Numerics.Vector<T>, whose | ||
| // size is determined dynamically based on the hardware and the presence of JIT | ||
|
|
@@ -1186,7 +1190,7 @@ BOOL MethodTableBuilder::CheckIfSIMDAndUpdateSize() | |
| { | ||
| STANDARD_VM_CONTRACT; | ||
|
|
||
| #if defined(TARGET_X86) || defined(TARGET_AMD64) | ||
| #if defined(TARGET_X86) || defined(TARGET_AMD64) || defined(TARGET_ARM64) | ||
| if (!bmtProp->fIsIntrinsicType) | ||
| return false; | ||
|
|
||
|
|
@@ -1205,6 +1209,7 @@ BOOL MethodTableBuilder::CheckIfSIMDAndUpdateSize() | |
| CORJIT_FLAGS CPUCompileFlags = ExecutionManager::GetEEJitManager()->GetCPUCompileFlags(); | ||
| uint32_t numInstanceFieldBytes = 16; | ||
|
|
||
| #if defined(TARGET_X86) || defined(TARGET_AMD64) | ||
| if (CPUCompileFlags.IsSet(InstructionSet_VectorT512)) | ||
| { | ||
| numInstanceFieldBytes = 64; | ||
|
|
@@ -1213,13 +1218,19 @@ BOOL MethodTableBuilder::CheckIfSIMDAndUpdateSize() | |
| { | ||
| numInstanceFieldBytes = 32; | ||
| } | ||
| #elif defined(TARGET_ARM64) | ||
| if (CPUCompileFlags.IsSet(InstructionSet_VectorT)) | ||
| { | ||
| numInstanceFieldBytes = (uint32_t) GetSveLengthFromOS(); | ||
| } | ||
|
Comment on lines
+1222
to
+1225
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is "correct" because we'll rather have
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was thinking it's going to be At some point we will need to decide if we prefer AdvSimd or SVE when the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Correct, that should generally be an error scenario and effectively a bug in the ISA detection logic in the VM, but I wanted to make sure it was being persisted here and wasn't something "unique" for the scalable scenario.
👍 |
||
| #endif | ||
|
|
||
| if (numInstanceFieldBytes != 16) | ||
| { | ||
| bmtFP->NumInstanceFieldBytes = numInstanceFieldBytes; | ||
| return true; | ||
| } | ||
| #endif // TARGET_X86 || TARGET_AMD64 | ||
| #endif // TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to potentially cause light up for a lot of unintended structs, which can hurt startup perf
Won't SVE, in most scenarios, rather be "size unknown" and in isolated scenarios a JIT (but not AOT or pre-JIT) environment may be able to explicitly query the true size and optimize a few things (like frame layout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed, we could query the actual size here in JIT mode, and use this as the upper bound. We can also filter sizes that are powers of 2 in bits, which could help with AOT as well.
As I've added the primitive for it, I could just do this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry please ignore this comment, I've got confused with another patch I'm preparing. I will add the optimization to that patch instead, which adds a primitive to read the VL from
Vector<T>metadata.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a circular dependency between querying the size of
Vector<T>and callingstructSizeMightRepresentSIMDType.Vector<T>needs to have been seen by the JIT to query the size, but the JIT will typically not pattern match forVector<T>(ingetBaseTypeAndSizeOfSIMDType) untilstructSizeMightRepresentSIMDTypeis true.I can't find a way to try and look for a class handle by name, and I'm assuming this is by design? So to make this optimization happen, I think I'd need to cache a
Vector<T>handle when it's found, and have some sort of ready state to check whether it's been seen yet. Then switch to the optimal maximum bound when it's available.