Skip to content

[Metal] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1281

Merged
EmilioLaiso merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-metal
Jun 24, 2026
Merged

[Metal] Add ray tracing pipeline, SBT, and DispatchRays bring-up#1281
EmilioLaiso merged 3 commits into
llvm:mainfrom
Traverse-Research:rt-pso-metal

Conversation

@MarijnS95

@MarijnS95 MarijnS95 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Depends on #1275

Summary

Last backend in the PSO RT bring-up stack. DXR-style ray tracing reaches Metal through metal_irconverter: each RT entry point is lowered from DXIL to a Metal IR function, raygen is emitted as a kernel (IRRayGenerationCompilationKernel) so it can be dispatched directly, and miss / closest-hit / any-hit / intersection / callable functions are emitted as visible functions and pulled into a MTLVisibleFunctionTable.

Fills in the three virtuals the foundation PR left stubbed on Metal:

  • MTLDevice::createPipelineRT compiles every Shaders[] entry against a single IRRayTracingPipelineConfiguration (max attribute / recursion budget from the YAML RTConfig), builds one MTL::Library per entry, hands the raygen function to the compute pipeline as the kernel, and registers the rest as LinkedFunctions. The freshly-built pipeline then mints a MTLVisibleFunctionTable and resolves each callable function's handle into a slot index that the SBT builder reuses. setMaxCallStackDepth(MaxTraceRecursionDepth) is set so nested TraceRay actually unwinds (default of 1 silently drops the second trace).
  • MTLDevice::createShaderBindingTable lays the four SBT regions out via the shared computeSBTLayout helper sized for IRShaderIdentifier records, looks up each region entry's ShaderName in the pipeline's name → IRShaderIdentifier map, and memcpys the records into a shared-storage MTL::Buffer the runtime dereferences at dispatch.
  • MTLComputeEncoder::dispatchRays binds the raygen pipeline and runs dispatchThreads(Width, Height, Depth) on the encoder. The caller (createRayTracingCommands in MTLDevice) builds the per-dispatch IRDispatchRaysArgument struct (SBT region addresses + sizes, GRS / ResDescHeap GPU pointers, visible / intersection function table resourceIDs), parks it in a shared MTL::Buffer kept alive on the command buffer's KeepAlive list, and binds it at kIRRayDispatchArgumentsBindPoint so callees reached via TraceRay() inherit the same dispatch state through that pointer.

Plumbs the existing executeProgram RT branch on Metal the same way the VK / DX backends already do (validate Shaders / SBT / RTConfig, build RayTracingPipelineCreateDesc from the YAML pipeline, create PSO, build SBT, record commands), and adds the raytracing-pipeline lit feature on Metal so test/Feature/RT/raygen-roundtrip.test drops Metal from its XFAIL list and passes natively on Apple Silicon.

This bring-up only handles Triangle hit groups whose only member is a ClosestHit shader — any-hit / intersection / procedural / local root signatures land in follow-ups; createPipelineRT returns a clear unsupported error for those shapes instead of silently producing wrong output.

Test plan

Local on an NVIDIA RTX 3060:

  • Linux Vulkan (native offloader)
  • Linux D3D12 (Wine + vkd3d-proton + cross-compiled offloader.exe)
  • Windows Vulkan (native offloader.exe)
  • Windows D3D12 (native offloader.exe)

CI (RT-capable runners):

  • windows-nvidia D3D12 (RaytracingTier 1.2)
  • windows-intel VK (VK_KHR_ray_tracing_pipeline)
  • macOS Metal (supportsRaytracing)

DXR-style ray tracing reaches Metal through metal_irconverter: each RT
entry point is lowered from DXIL to a Metal IR function, raygen is
emitted as a kernel (IRRayGenerationCompilationKernel) so it can be
dispatched directly, and miss / closest-hit / any-hit / intersection /
callable functions are emitted as visible functions and pulled into a
MTLVisibleFunctionTable.

Implements the three virtuals the foundation PR left stubbed on Metal:

  • MTLDevice::createPipelineRT compiles every Shaders[] entry against a
    single IRRayTracingPipelineConfiguration (max attribute/recursion
    from the YAML RTConfig), builds one MTL::Library per entry, hands
    the raygen function to the compute pipeline as the kernel, and
    registers the rest as LinkedFunctions. The freshly-built pipeline
    then mints a MTLVisibleFunctionTable and resolves each callable
    function's handle into a slot index that the SBT builder reuses.

  • MTLDevice::createShaderBindingTable lays the four SBT regions out
    via the shared computeSBTLayout helper sized for IRShaderIdentifier
    records, looks up each region entry's ShaderName in the pipeline's
    name → IRShaderIdentifier map, and memcpys the records into a
    shared-storage MTL::Buffer the runtime will dereference at dispatch.

  • MTLComputeEncoder::dispatchRays binds the raygen pipeline and runs
    dispatchThreads(Width, Height, Depth) on the encoder. The caller
    (createRayTracingCommands) is responsible for binding the global
    descriptor heap, top-level argument buffer, IRDispatchRaysArgument
    (slot 3), and marking the SBT buffer + function tables resident.

The IRDispatchRaysArgument struct is built per-dispatch in
createRayTracingCommands: SBT region addresses + sizes (read off the
MTLShaderBindingTable), GRS / ResDescHeap GPU pointers, and the
visible / intersection function table resourceIDs. It's parked in a
shared MTL::Buffer kept alive on the command buffer's KeepAlive list
and bound at kIRRayDispatchArgumentsBindPoint so callees reached via
TraceRay() inherit the same dispatch state through that pointer.

Plumbs the existing executeProgram RT branch on Metal the same way the
VK / DX backends already do (validate Shaders / SBT / RTConfig, build
RayTracingPipelineCreateDesc from the YAML pipeline, create PSO, build
SBT, record commands), and adds the raytracing-pipeline lit feature
on Metal so test/Feature/RT/raygen-roundtrip.test drops Metal from its
XFAIL list and passes natively on Apple Silicon (the 0xBEEF payload
roundtrip matches the DX / VK references, verified locally on
macOS 15 / metal-irconverter 3.1.1).

This PR1 bring-up only handles Triangle hit groups whose only member
is a ClosestHit shader — any-hit / intersection / procedural / local
root signatures land in follow-ups; createPipelineRT now returns a
clear unsupported error for those shapes instead of silently producing
wrong output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EmilioLaiso EmilioLaiso marked this pull request as ready for review June 22, 2026 12:45
@EmilioLaiso EmilioLaiso merged commit f4e85ca into llvm:main Jun 24, 2026
21 of 26 checks passed
EmilioLaiso pushed a commit that referenced this pull request Jun 26, 2026
Depends on #1281

## Summary

Four small PSO raytracing tests stacked on top of #1275, each isolating
one shader-observable surface from the 👍 list in #1268. Same shape as
the inline-RT batch already in flight in #1271 / #1272 / #1274 / #1276 —
one `.test` file per behavior, single-purpose shader, exact buffer
comparison.

- `dispatch-rays-index.test` — 4x1x1 dispatch, raygen writes
`DispatchRaysIndex().x` into `Output[index]`. Confirms the dispatch grid
plumbs through to the per-lane system value with no BLAS / TLAS / hit
groups in play (RT-pipeline-only, no AS binding).
- `dispatch-rays-dimensions.test` — 2x3x1 dispatch, raygen packs the
constant `DispatchRaysDimensions()` into one uint per lane. Confirms
every lane sees the host-side `{W, H, D}` even when only one dimension >
1.
- `miss-shader-index.test` — two miss shaders writing distinct sentinels
(0xAA / 0xBB). 2-lane dispatch picks `MissShaderIndex` 0 and 1
respectively; rays start far enough from the geometry that every ray
misses. Verifies the SBT miss region's per-record routing.
- `ray-contribution-to-hit-group-index.test` — two hit groups with
distinct closest-hit shaders (0xA1 / 0xB2). 2-lane dispatch picks
`RayContributionToHitGroupIndex` 0 and 1, every ray hits the same
triangle. Verifies the SBT hit-group region's per-record routing.

The first two have no AS / Miss / HitGroup in their pipeline at all —
just a raygen + a UAV — which doubles as a regression check for the
minimum viable RT pipeline shape (one raygen group, zero-sized miss /
hit / callable SBT regions). The latter two reuse the single-triangle
BLAS / TLAS from `raygen-roundtrip.test`.

All four tests are `# REQUIRES: raytracing-pipeline` with `# XFAIL:
Clang` — `clang-dxc` doesn't yet lower `[shader("…")]` entry points to
either DXIL libraries or SPIR-V. With the Metal RT bring-up in #1281
rebased underneath this branch, all four pass natively on Apple Silicon
and `Metal` is dropped from the XFAIL list.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 26, 2026
…ldRay (#1278)

Depends on #1281

## Summary

Three small PSO raytracing tests stacked on #1275, each isolating one
shader-observable closest-hit system value from #1268's 👍 list. Same
shape as the prior batch in #1277 — one `.test` file per behavior,
single-purpose shader, exact buffer comparison.

- `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at
a clearly-interior point of the single triangle so the closest-hit
shader reports a known
`BuiltInTriangleIntersectionAttributes::barycentrics` (u, v). Points are
picked from the inside of the triangle to avoid the watertight-traversal
edge-rule lottery you hit at edge midpoints / vertices (the first cut of
this test used `midpoint(v0, v1)` and one lane silently missed on both
backends).
- `closest-hit-primitive-index.test` — three triangles tiled at x = -3,
0, +3 in a single BLAS. 3-lane dispatch fires straight down at each
triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must
match the lane index 0..2.
- `closest-hit-world-ray.test` — 2-lane dispatch with rays from
different z heights (1.0 and 2.0). Closest-hit packs
`WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()`
through the payload; raygen flattens the float3 into a 6-element Float32
buffer. Verifies the system values match the raygen-side `RayDesc` and
that t is correctly computed by the traversal.

All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` —
`clang-dxc` doesn't yet lower `[shader("…")]` entry points. With the
Metal RT bring-up in #1281 rebased underneath this branch, all three
pass natively on Apple Silicon and `Metal` is dropped from the XFAIL
list.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants