Remove 128-bit limit on Vector<T> size for ARM64#129852
Conversation
If InstructionSet_VectorT is available, set the class instance size to the process SVE vector length. Increase the maximum bound in structMightRepresentSIMDType to allow the JIT to detect this when the ISA is present.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
| #ifdef FEATURE_SIMD | ||
| return (structSize >= getMinVectorByteLength()) && (structSize <= getMaxVectorByteLength()); | ||
| #ifdef TARGET_ARM64 | ||
| const uint32_t max = compExactlyDependsOn(InstructionSet_VectorT) ? MAX_SVE_REGSIZE_BYTES : FP_REGSIZE_BYTES; |
There was a problem hiding this comment.
This is going to potentially cause light up for a lot of unintended structs, which can hurt startup perf
Won't SVE, in most scenarios, rather be "size unknown" and in isolated scenarios a JIT (but not AOT or pre-JIT) environment may be able to explicitly query the true size and optimize a few things (like frame layout)
There was a problem hiding this comment.
Yes agreed, we could query the actual size here in JIT mode, and use this as the upper bound. We can also filter sizes that are powers of 2 in bits, which could help with AOT as well.
As I've added the primitive for it, I could just do this in this PR.
There was a problem hiding this comment.
As I've added the primitive for it, I could just do this in this PR.
Sorry please ignore this comment, I've got confused with another patch I'm preparing. I will add the optimization to that patch instead, which adds a primitive to read the VL from Vector<T> metadata.
There was a problem hiding this comment.
There's a circular dependency between querying the size of Vector<T> and calling structSizeMightRepresentSIMDType. Vector<T> needs to have been seen by the JIT to query the size, but the JIT will typically not pattern match for Vector<T> (in getBaseTypeAndSizeOfSIMDType) until structSizeMightRepresentSIMDType is true.
I can't find a way to try and look for a class handle by name, and I'm assuming this is by design? So to make this optimization happen, I think I'd need to cache a Vector<T> handle when it's found, and have some sort of ready state to check whether it's been seen yet. Then switch to the optimal maximum bound when it's available.
| if (CPUCompileFlags.IsSet(InstructionSet_VectorT)) | ||
| { | ||
| numInstanceFieldBytes = (uint32_t) GetSveLengthFromOS(); | ||
| } |
There was a problem hiding this comment.
This is "correct" because we'll rather have InstructionSet_VectorT128 if we have AdvSimd without SVE, correct?
There was a problem hiding this comment.
Yes, I was thinking it's going to be InstructionSet_VectorT128 XOR InstructionSet_VectorT, never both enabled at the same time. So InstructionSet_VectorT is only serviced by SVE and will not be available when SVE is not there.
At some point we will need to decide if we prefer AdvSimd or SVE when the VL == 128 bits. This is dependent on micro-architecture, but we can find a different reason to pick one generally.
There was a problem hiding this comment.
never both enabled at the same time.
Correct, that should generally be an error scenario and effectively a bug in the ISA detection logic in the VM, but I wanted to make sure it was being persisted here and wasn't something "unique" for the scalable scenario.
At some point we will need to decide if we prefer AdvSimd or SVE when the VL == 128 bits. This is dependent on micro-architecture, but we can find a different reason to pick one generally.
👍
If
InstructionSet_VectorTis available, set the class instance size to the process SVE vector length.Increase the maximum bound in
structMightRepresentSIMDTypeto allow the JIT to detect this when the ISA is present.