Skip to content

[cDAC] Enumerate R2R hashmap pages into triage dumps for odd entry points#129931

Merged
max-charlamb merged 6 commits into
dotnet:mainfrom
max-charlamb:max-charlamb/cdac-r2r-odd-entrypoint
Jun 28, 2026
Merged

[cDAC] Enumerate R2R hashmap pages into triage dumps for odd entry points#129931
max-charlamb merged 6 commits into
dotnet:mainfrom
max-charlamb:max-charlamb/cdac-r2r-odd-entrypoint

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

Fix the cDAC's R2R stack walk aborting a frame past a SoftwareExceptionFrame when reading macOS triage minidumps. The root cause is an odd-entry-point optimization in the legacy DAC that skips a hashmap probe, so the bucket pages that probe would touch are never enumerated into the triage dump -- and the cDAC consumer then faults reading those not-in-dump pages.

Tracking issue: dotnet/diagnostics#5910 -- full repro, captured failure trace, and dump-file page-map evidence are there.

Root cause

ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage looks up an entry point in the EntryPointToMethodDescMap hashmap. It has an x64/x86 fast path that returns early when the entry point is odd:

#if defined(TARGET_AMD64) || defined(TARGET_X86)
    // A normal method entry point is always 8 byte aligned, but a funclet can start at an odd address.
    if ((entryPoint & 0x1) != 0)
        return NULL;
#endif

This is purely a performance optimization: the map only ever contains true method entry points (4-byte aligned on every architecture), so a lookup for an odd (funclet) address is always a miss. (PtrHashMap handles odd keys fine -- they're just never present.)

The problem is the side effect in DAC builds. When createdump enumerates memory for a triage minidump (DOTNET_DbgMiniDumpType=3), it drives stack walking through this same path. The fast path makes it return without ever probing the hashmap for odd (funclet) entry points, so the bucket pages that probe would touch are never read and never captured into the dump.

The cDAC consumer, walking the same stack later, probes the hashmap for that odd entry point, lands in a bucket page absent from the dump, throws VirtualReadException, and aborts the walk -- SOS reports <failed> one frame past the SoftwareExceptionFrame.

Fix

Make the producer enumerate the pages instead of mirroring the bail on the consumer:

  • src/coreclr/vm/readytoruninfo.cpp: gate the odd-entry-point optimization with !defined(DACCESS_COMPILE). The live runtime keeps the perf win; DAC builds perform the lookup so the bucket pages are enumerated into triage minidumps. Also removed the inaccurate "PtrHashMap can't handle odd pointers" comment.
  • cDAC ExecutionManagerCore.ReadyToRunJitManager: removed the bail entirely. With the DAC enumerating the pages, the consumer reads them naturally.

This is a root-cause fix (complete dump enumeration) rather than masking the symptom on the consumer side, and it benefits the legacy DAC consumer too, not just the cDAC.

Validation

  • Native build: rebuilt coreclr.dll (non-DAC) and mscordaccore.dll (DAC); both compilations of readytoruninfo.cpp succeed with 0 warnings / 0 errors, confirming the #if change is valid in both configurations.
  • cDAC unit tests (Microsoft.Diagnostics.DataContractReader.Tests): all green.
  • End to end: the failing diagnostics CI tests create fresh triage dumps, so once this change rides in via the runtime -> diagnostics flow, the dumps will contain the previously-missing bucket pages and the cDAC walk completes. (Dumps captured by an older, still-bailing DAC won't contain those pages; this fix applies to newly created dumps.)

Note

This PR description was drafted with assistance from GitHub Copilot.

Mirror the legacy DAC's fast path in
ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage
(src/coreclr/vm/readytoruninfo.cpp:371-376): on AMD64 and x86 a normal
method entry point is always 2+ byte aligned, but a funclet can start at
an odd address. The legacy DAC bails without probing the
EntryPointToMethodDescMap PtrHashMap for these.

The cDAC was probing the hashmap for odd entry points. That hash lands
in bucket slots the producer-side DAC walker (which has the fast-path
bail) never touches and therefore never enumerates into a triage
minidump. The consumer-side cDAC probe then faults reading the
not-in-dump bucket page, throws VirtualReadException, and aborts the
whole stack walk a single frame past a SoftwareExceptionFrame.

Manifests as SOS '!ClrStack' producing only:
    [SoftwareExceptionFrame: ...]
    <failed>
    Stack Walk failed. Reported stack incomplete.
on net11 macOS triage dumps where cDAC defaults on (dotnet/diagnostics
PR dotnet#5874). Diagnosed via dotnet/diagnostics issue #<TBD>.

Validated on the failing CI dump (SOS.ReflectionTest.Triage.dmp from
public build 1483142, macOS x64): pre-fix dies after one frame; with
fix produces the full eight-frame managed stack with file/line
numbers, matching the legacy DAC's output.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the cDAC ReadyToRun MethodDesc lookup to short-circuit on x86/x64 when the computed entry point is odd (low bit set), returning TargetPointer.Null rather than probing EntryPointToMethodDescMap. This aligns the cDAC behavior with the legacy DAC’s fast-path and avoids doing an unnecessary hash map lookup for entry points that won’t be present as keys.

Changes:

  • Add an architecture-gated odd-entry-point bail-out in GetMethodDescForRuntimeFunction (x86/x64 only).
  • Minor formatting-only adjustment in the ReadyToRun major-version → GCInfo-version switch expression (no behavioral change).

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Max Charlamb and others added 2 commits June 27, 2026 21:26
The odd-entry-point bail added to GetMethodDescForRuntimeFunction calls
Target.Contracts.RuntimeInfo.GetTargetArchitecture(), but the test target
built by CreateTarget only registered IExecutionManager and a mock
IPlatformMetadata. Every R2R test then threw NotImplementedException
('Contract IRuntimeInfo is not supported by the target').

Register a mock IRuntimeInfo that reports X64/X86 based on the test
architecture's bitness, keeping the suite green while exercising the new
x86/x64 branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb marked this pull request as ready for review June 28, 2026 01:31
Copilot AI review requested due to automatic review settings June 28, 2026 01:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@max-charlamb max-charlamb changed the title [cDAC] Bail R2R MethodDesc lookup for odd entry points on AMD64/x86 [cDAC] Bail R2R MethodDesc lookup for odd entry points Jun 28, 2026
Per review feedback, fix the root cause instead of mirroring the bail in the
cDAC. The odd-entry-point check in ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage
is a perf optimization that skips a guaranteed-miss hashmap lookup for funclet
addresses. In DAC builds it had the side effect of skipping the probe during
createdump's memory enumeration, so the bucket pages were never captured into
triage minidumps and the cDAC consumer faulted reading them.

Gate the optimization with !DACCESS_COMPILE so the DAC performs the lookup and
enumerates those pages, and remove the cDAC bail entirely so it reads naturally.
Also delete the inaccurate 'PtrHashMap can't handle odd pointers' comment - the
map handles odd keys fine, they are simply never present.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 04:33
@max-charlamb max-charlamb changed the title [cDAC] Bail R2R MethodDesc lookup for odd entry points [cDAC] Enumerate R2R hashmap pages into triage dumps for odd entry points Jun 28, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/readytoruninfo.cpp
The fix lives entirely in the runtime: gating the odd-entry-point optimization
with !DACCESS_COMPILE makes the DAC enumerate the bucket pages into triage
dumps, so the cDAC needs no change and reads them naturally. Restore the file
to its main state, including the unrelated switch-expression whitespace.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb merged commit 33db161 into dotnet:main Jun 28, 2026
111 checks passed
@max-charlamb max-charlamb deleted the max-charlamb/cdac-r2r-odd-entrypoint branch June 28, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants