Summary
After we enabled cDAC default-on for net11 #5874 , SOSMethodTests.Reflection started failing intermittently on Linux/macOS in CI for the net11 triage dump iteration. Same test passes for net8/9/10 and for non-triage (heap) dumps. Across the public + internal diagnostics pipelines in the last 7 days I counted ~12 failures, all with the same shape.
Failing leg: lldb + dotnet-dump-create-triage style minidump (DOTNET_DbgMiniDumpType=3) on net11 preview, x64.
Symptom
SOS !ClrStack on the crashing thread produces only:
OS Thread Id: 0x274bc (0)
Child SP IP Call Site
00007FF7B9EDBE40 00007ff81604aada [SoftwareExceptionFrame: 00007ff7b9edbe40]
<failed>
Stack Walk failed. Reported stack incomplete.
The walk yields exactly one frame (the SEF) and then aborts. The test asserts on later frames (Reflection.MethodBaseInvoker.InvokeWithNoArgs, etc.) and fails.
Root cause
The cDAC's R2R MethodDesc lookup in
src/native/managed/cdac/.../ExecutionManagerCore.ReadyToRunJitManager.GetMethodDescForRuntimeFunction
is missing the AMD64/x86 "odd entry point" fast path that the legacy DAC has in
src/coreclr/vm/readytoruninfo.cpp (ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage):
#if defined(TARGET_AMD64) || defined(TARGET_X86)
// A normal method entry point is always 8 byte aligned, but a funclet can start
// at an odd address. Since PtrHashMap can't handle odd pointers, check for this
// case and return NULL.
if ((entryPoint & 0x1) != 0)
return NULL;
#endif
TADDR val = m_entryPointToMethodDescMap.LookupValueByUniqueKey(PCODEToPINSTR(entryPoint));
(The comment is misleading — PtrHashMap can handle odd keys; they just always return INVALIDENTRY. The bail is purely a perf optimization that skips a known-useless probe.)
What that means for triage dumps
The producer-side DAC's EnumMem for triage dumps drives stack walking through the same JitCodeToMethodInfo path, which calls GetMethodDescForEntryPoint. For odd entry points (funclets), the legacy DAC returns NULL without ever probing the hashmap. The associated bucket pages are therefore never read by the producer and never enumerated into the triage dump.
The cDAC consumer, lacking the fast path, does probe the hashmap for odd entry points. The hash lands in a bucket page that the producer never enumerated → unmapped in the dump → VirtualReadException reading the bucket → exception propagates up through StackWalk_1.Next → ClrDataStackWalk.MoveNextLegacyVisible → SOS reports <failed>.
Concrete trace from a captured failing dump
SOS.ReflectionTest.Triage.dmp from public build 1483142 (macOS x64, net11 preview):
[cdac] IsManaged(ip=0x107a8ec4d) ← recovered post-SEF managed IP
[cdac] HashMap.GetValue map=0x107704588
key=0x107a8ec2d ← funclet entryPoint, low bit SET
size=431
buckets=0x7fa3d2022640
seed=1105869579 incr=309
[cdac] probe i=0 slot=297 bucketAddr=0x7fa3d2027080
[cdac] ClrDataStackWalk.Next() EXCEPTION:
VirtualReadException: Failed to read pointer at 0x7fa3d2027080
at HashMapLookup.GetValue
at PtrHashMapLookup.GetValue
at ReadyToRunJitManager.GetMethodDescForRuntimeFunction
at ReadyToRunJitManager.AdjustRuntimeFunctionToMethodStart
at ReadyToRunJitManager.GetMethodInfo
at ExecutionManagerCore.GetCodeBlockHandle
at StackWalk_1.IsManaged
at StackWalk_1.UpdateState
at StackWalk_1.Next
at ClrDataStackWalk.MoveNextLegacyVisible
Bucket array page map in the dump:
| Page |
Status |
0x...22xxx |
✓ in dump |
0x...23xxx |
✓ in dump |
0x...24xxx |
✗ missing |
0x...25xxx |
✓ in dump |
0x...26xxx |
✓ in dump |
0x...27xxx |
✗ missing (this is where slot 297 lands) |
0x...28xxx |
✗ missing |
0x...29xxx |
✗ missing |
The producer's normal hashmap probes for non-funclet IPs touched pages 22/23/25/26 (those got captured). Page 27 was never touched because the producer's only odd-entryPoint probe (for the funclet) was bailed by the fast path.
Fix
Add the same AMD64/x86 odd-pointer bail to the cDAC's GetMethodDescForRuntimeFunction. Runtime PR: dotnet/runtime# (will link).
Validated locally with the captured failing dump:
| Build |
Result |
| Baseline cDAC (no fix) |
dies after 1 frame |
| Fixed cDAC |
full 8-frame managed stack with file/line numbers |
After the runtime PR merges and rides into the diagnostics package via the standard flow, the failing Reflection tests should go green on net11.
Summary
After we enabled cDAC default-on for net11 #5874 ,
SOSMethodTests.Reflectionstarted failing intermittently on Linux/macOS in CI for the net11 triage dump iteration. Same test passes for net8/9/10 and for non-triage (heap) dumps. Across the public + internal diagnostics pipelines in the last 7 days I counted ~12 failures, all with the same shape.Failing leg: lldb +
dotnet-dump-create-triagestyle minidump (DOTNET_DbgMiniDumpType=3) on net11 preview, x64.Symptom
SOS
!ClrStackon the crashing thread produces only:The walk yields exactly one frame (the SEF) and then aborts. The test asserts on later frames (
Reflection.MethodBaseInvoker.InvokeWithNoArgs, etc.) and fails.Root cause
The cDAC's R2R MethodDesc lookup in
src/native/managed/cdac/.../ExecutionManagerCore.ReadyToRunJitManager.GetMethodDescForRuntimeFunctionis missing the AMD64/x86 "odd entry point" fast path that the legacy DAC has in
src/coreclr/vm/readytoruninfo.cpp(ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage):(The comment is misleading —
PtrHashMapcan handle odd keys; they just always returnINVALIDENTRY. The bail is purely a perf optimization that skips a known-useless probe.)What that means for triage dumps
The producer-side DAC's
EnumMemfor triage dumps drives stack walking through the sameJitCodeToMethodInfopath, which callsGetMethodDescForEntryPoint. For odd entry points (funclets), the legacy DAC returns NULL without ever probing the hashmap. The associated bucket pages are therefore never read by the producer and never enumerated into the triage dump.The cDAC consumer, lacking the fast path, does probe the hashmap for odd entry points. The hash lands in a bucket page that the producer never enumerated → unmapped in the dump →
VirtualReadExceptionreading the bucket → exception propagates up throughStackWalk_1.Next→ClrDataStackWalk.MoveNextLegacyVisible→ SOS reports<failed>.Concrete trace from a captured failing dump
SOS.ReflectionTest.Triage.dmpfrom public build 1483142 (macOS x64, net11 preview):Bucket array page map in the dump:
0x...22xxx0x...23xxx0x...24xxx0x...25xxx0x...26xxx0x...27xxx0x...28xxx0x...29xxxThe producer's normal hashmap probes for non-funclet IPs touched pages 22/23/25/26 (those got captured). Page 27 was never touched because the producer's only odd-entryPoint probe (for the funclet) was bailed by the fast path.
Fix
Add the same AMD64/x86 odd-pointer bail to the cDAC's
GetMethodDescForRuntimeFunction. Runtime PR: dotnet/runtime# (will link).Validated locally with the captured failing dump:
After the runtime PR merges and rides into the diagnostics package via the standard flow, the failing Reflection tests should go green on net11.