Metal: Stable argument buffers; GPU rendering crashes; visionOS exports #111976

stuartcarnie · 2025-10-24T00:07:38Z

Supersedes #110683

Note

Some of the changes were moving code from Metal 3-specific files into a common file, to allow for reuse when adding Metal 4 support.

Summary

The PR addresses the following bugs and regressions:

Rendering artefacts and GPU crashes for all Apple Silicon
- Reproducible on on iOS and visionOS due to lower memory specs
- Supersedes Metal: fix incorrect usage of useResources: APIs #110683
Unable to export and bake shaders for visionOS
Incorrect Metal Shader Language and OS feature targeting in shader baker
Performance regression using UMA buffers

The PR adds the following improvements and optimisations:

Reduce memory and CPU usage by generating stable argument buffer bindings for shaders
Uses lodBias on 26.0+ OSs and removes the warning "Metal does not support LOD bias for samplers."
Uses GPU encoded MTLEvent rather than callbacks to handle frame synchronisation on supported OSs
Adds support for debugPrintfEXT in Metal – which is propagated through. See this for more info

Details

FIX: Rendering artefacts and GPU crashes

The correct usage of useResources:count:usage:stages: and useResources:count:usage: was previously misunderstood, assuming that all resources must be made resident before calling endEncoding on the MTLCommandEncoder. The documentation is clear that resources used by subsequent draw calls must be made resident before encoding the draw or dispatch command:

You can make multiple resources resident (available in GPU memory) for the remaining duration of the render pass by calling this method. Call the method before encoding draw calls that may access the elements of resources through an argument buffer. The method ensures each resource is in a format that’s compatible with the shaders that depend on it.

Note

Stable argument buffers reduced the complexity and CPU resources required to manage this data.

FIX: Unable to export and bake shaders for visionOS

visionOS was omitted from the shader baking export, so no shaders were baked and Godot would generate errors.

FIX: Incorrect Metal Shader Language and OS feature targeting

When Metal shaders are generated from SPIR-V and available features determined, only two variables were considered:

Minimum GPU, and Metal language version

However, the minimum OS target version must also be considered, as certain APIs and Metal language features may be unavailable. Improved the Metal shader container to capture all three to determine the available features and what shader features should be generated.

The shader features and then passed to the RenderingDeviceDriverMetal to ensure it only uses the features specified in the generated shader.

Note

In a future PR, we will add support for baking multiple shader versions, so that the target system can choose the best available based on the OS and GPU.

FIX: Performance regression using UMA buffers

UMA buffers for Metal does not use argument buffers when using a UMA buffer, which is all canvas 2D rendering. With the previous implementation, all slots were updated every time each time a uniform set changed. For 2D rendering, when a texture changes frequently, this resulted in costly calls to the Metal command encoder to encode all slots, even if it was only the texture, and possibly the sampler, had changed. This update caches the slots that have changed, so only the minimal Metal binding calls are executed. This should improve performance across the board for all devices using direct / slot binding in Metal

IMPROVEMENT: Reduce memory and CPU usage

The changes to use stable argument buffer bindings means that Metal shaders generated from SPIR-V now produce consistent argument buffer layouts across shader versions and pipeline stages, by using the information from the RenderingShaderContainer. This class has had some improvements to include additional reflected data that is passed to the device-specific shader containers.

By ensuring argument buffer layout is consistent, we no longer have to generate an argument buffer per shader version and stage, which reduces the calculation and layout of 100s per shader variant, in some cases! This was happening for every material in the Bistro demo, which had 100s of materials. That resulted in unique argument buffers for every shader material.

Important

These changes are also preparation for adding Metal 4 support in the future

These changes had small improvements across the board for the Godot reflection benchmark.

45 gm is the current 4.5.1 version
46 args disables is when argument buffers are disabled, and slot or direct binding is used
46 args enabled is when argument buffers are enabled

FPS

Godot Version	Description	FPS Mean	FPS Median	FPS 5% Low	FPS 99% High
4.5	gm	195	185	128	380
4.6	dev args disabled	197	186	130	383
4.6	dev args enabled	199	189	132	392

GPU times

Godot Version	Description	Frames	GPU Time Mean (ms)	GPU Time Median (ms)	GPU Time 99% (ms)
4.5	gm	4995	1.95	1.89	3.77
4.6	dev args disabled	4995	1.94	1.91	3.20
4.6	dev args enabled	4996	1.87	1.79	3.02

Memory improvements

Savings of about 1MB with fewer argument buffer allocations

Godot Version	Description	GPU Memory Mean (MB)	Process Memory Mean (MB)	Process Memory Max (MB)
4.5	gm	788	1,585	1,590
4.6	dev args disabled	787	1,529	1,536
4.6	dev args enabled	787	1,521	1,526

stuartcarnie · 2025-10-24T00:58:24Z

drivers/metal/metal_device_profile.cpp

Device Profile is now keyed by platform (macOS, iOS, etc), GPU and minimum OS version. This ensures that when generating or baking the shader, it selects the correct features based on the target OS also.

stuartcarnie · 2025-10-24T01:01:22Z

drivers/metal/metal_objects.h

+/*! Track resource and ensure they are resident prior to dispatch or draw commands.
+ *
+ * The primary purpose of this data structure is to track all the resources that must be made resident prior
+ * to issuing the next dispatch or draw command. It aggregates all resources used from argument buffers.
+ *
+ * As an optimization, this data structure also tracks previous usage for resources, so that
+ * it may avoid binding them again in later commands if the resource is already resident and its usage flagged.
+ */
+struct API_AVAILABLE(macos(11.0), ios(14.0), tvos(14.0)) ResourceTracker {


Fixes GPU corruption / crashes by tracking resource usage and ensuring they are resident prior to each command (draw, dispatch, etc)

stuartcarnie · 2025-10-24T01:02:05Z

drivers/metal/metal_objects.h

+	void resolve_texture(RDD::TextureID p_src_texture, RDD::TextureLayout p_src_texture_layout, uint32_t p_src_layer, uint32_t p_src_mipmap, RDD::TextureID p_dst_texture, RDD::TextureLayout p_dst_texture_layout, uint32_t p_dst_layer, uint32_t p_dst_mipmap);
+	void clear_color_texture(RDD::TextureID p_texture, RDD::TextureLayout p_texture_layout, const Color &p_color, const RDD::TextureSubresourceRange &p_subresources);
+	void clear_buffer(RDD::BufferID p_buffer, uint64_t p_offset, uint64_t p_size);
+	void copy_buffer(RDD::BufferID p_src_buffer, RDD::BufferID p_dst_buffer, VectorView<RDD::BufferCopyRegion> p_regions);
+	void copy_texture(RDD::TextureID p_src_texture, RDD::TextureID p_dst_texture, VectorView<RDD::TextureCopyRegion> p_regions);
+	void copy_buffer_to_texture(RDD::BufferID p_src_buffer, RDD::TextureID p_dst_texture, VectorView<RDD::BufferTextureCopyRegion> p_regions);
+	void copy_texture_to_buffer(RDD::TextureID p_src_texture, RDD::BufferID p_dst_buffer, VectorView<RDD::BufferTextureCopyRegion> p_regions);


Moved the implementation of these from the RenderingDeviceDriverMetal into MDCommandBuffer, for consistency

stuartcarnie · 2025-10-24T01:06:18Z

drivers/metal/metal_objects.h


 public:
 	uint32_t index;
+	id<MTLBuffer> arg_buffer = nil;


Now we have a single argument buffer per uniform set vs 100s or more

stuartcarnie · 2025-10-24T01:07:25Z

drivers/metal/metal_objects.mm

 	return blit.encoder;
 }

+_FORCE_INLINE_ static MTLSize mipmapLevelSizeFromTexture(id<MTLTexture> p_tex, NSUInteger p_level) {


The following block was moved from the RenderingDeviceDriverMetal into here, to be consistent with the other functions.

stuartcarnie · 2025-10-24T09:18:14Z

drivers/metal/rendering_shader_container_metal.mm

+	switch (device_profile->platform) {
+		case MetalDeviceProfile::Platform::macOS: {
+			parts.push_back("-mtargetos=macos" + device_profile->min_os_version.to_compiler_os_version());
+			break;
+		}
+		case MetalDeviceProfile::Platform::iOS: {
+			parts.push_back("-mtargetos=ios" + device_profile->min_os_version.to_compiler_os_version());
+			break;
+		}
+		case MetalDeviceProfile::Platform::visionOS: {
+			parts.push_back("-mtargetos=xros" + device_profile->min_os_version.to_compiler_os_version());
+			break;


We need to account for visionOS when generating Metal binaries

stuartcarnie · 2025-10-24T09:25:25Z

servers/rendering/rendering_shader_container.h


+	typedef LocalVector<ReflectUniform> ReflectDescriptorSet;
+
+	struct ReflectShader {


We define the reflect objects in the Shader Container, so that data flows outwards from Shader Container. It allows us to evolve what we reflect that is passed to the driver-specific shader containers.

Further, the ReflectShader type is passed to the driver-specific implementations to inspect the reflected SPIR-V.

Previously we were traversing the reflected SPIR-V and constructing RDD::ShaderReflection, which is used by the drivers and RenderingDriver. We were also using ShaderReflection to construct the internal state of the RenderingShaderContainer and also constructing the ShaderReflection from the internal state. We wanted to add more metadata to ShaderReflection, so Metal could build stable bindings, but that would mean changing ShaderReflection.

stuartcarnie · 2025-10-24T09:28:00Z

editor/shader/shader_baker/shader_baker_export_plugin_platform_metal.cpp

+	} else if (os_name == U"visionOS") {
+		min_os_version = (String)p_preset->get("application/min_visionos_version");
+		profile = MetalDeviceProfile::get_profile(MetalDeviceProfile::Platform::visionOS, MetalDeviceProfile::GPU::Apple8, min_os_version);


Ensure we can bake shaders for visionOS

drivers/metal/metal_device_profile.cpp

drivers/metal/metal_device_properties.mm

drivers/metal/rendering_shader_container_metal.mm

stuartcarnie · 2025-10-24T19:47:42Z

❤️ Thanks for the feedback, @AThousandShips – will incorporate all your changes!

stuartcarnie · 2025-10-24T21:36:12Z

Thanks @AThousandShips – all your feedback has been incorporated

stuartcarnie · 2025-10-26T21:36:26Z

drivers/metal/metal_objects.h

 	MDRenderPass(Vector<MDAttachment> &p_attachments, Vector<MDSubpass> &p_subpasses);
 };

+struct BindingCache {


The BindingCache is used to avoid redundant binding calls to a MTLCommandEncoder

stuartcarnie · 2025-10-26T21:37:38Z

drivers/metal/metal_objects.h

-class API_AVAILABLE(macos(11.0), ios(14.0), tvos(14.0)) DynamicOffsets {
-	uint32_t data;
-
-public:
-	_FORCE_INLINE_ uint32_t get_frame_index(const DynamicOffsetLayout &p_layout) const {
-		return data;
-	}
-};
-


Removed dead code from #111183

stuartcarnie · 2025-10-26T21:38:28Z

drivers/metal/metal_objects.h

+// A type used to encode resources directly to a MTLCommandEncoder
+struct DirectEncoder {


This allows us to greatly simplify the direct binding code, but unifying MTLRenderCommandEncoder and MTLComputeCommandEncoder binding and caching

servers/rendering/rendering_shader_container.h

Supersedes godotengine#110683

clayjohn

Let's go ahead with this.

I still have a bit of reservation about the duplication between RDC and ShaderContainer that this introduced. But I understand your rationale for it and can't think of a better option. I don't want to block this work due to my hesitation since it is most likely a result of my lack of familiarity with the ShaderContainer code.

So to move this forward I suggest that we merge this as-is. Then, when Dario is back from vacation, I will ask him to take a look as well and point out if there are any potential issues, or perhaps a better way to avoid the duplication that neither of us are seeing.

Repiteo · 2025-10-28T15:18:15Z

Thanks!

stuartcarnie · 2025-10-28T19:44:31Z

I still have a bit of reservation about the duplication between RDC and ShaderContainer that this introduced. But I understand your rationale for it and can't think of a better option. I don't want to block this work due to my hesitation since it is most likely a result of my lack of familiarity with the ShaderContainer code.

Thanks @clayjohn – and I agree.

I will spend some time looking at how this could be improved as a more targeted PR that doesn't have as many broad changes. I realise this turned into a large change, which wasn't my intention or at all ideal, but there were many stones unturned…

stuartcarnie force-pushed the metal_stable_bindings branch 9 times, most recently from 752e821 to 130c7c5 Compare October 24, 2025 04:27

stuartcarnie commented Oct 24, 2025

View reviewed changes

stuartcarnie marked this pull request as ready for review October 24, 2025 09:28

stuartcarnie requested review from a team as code owners October 24, 2025 09:28

AThousandShips reviewed Oct 24, 2025

View reviewed changes

stuartcarnie mentioned this pull request Oct 24, 2025

Metal: fix incorrect usage of useResources: APIs #110683

Closed

Calinou added enhancement platform:macos topic:rendering topic:porting topic:export labels Oct 24, 2025

Calinou added this to the 4.x milestone Oct 24, 2025

stuartcarnie force-pushed the metal_stable_bindings branch 6 times, most recently from 02afa1b to 14fa0a2 Compare October 24, 2025 21:27

stuartcarnie force-pushed the metal_stable_bindings branch from 14fa0a2 to 7660797 Compare October 24, 2025 21:35

stuartcarnie force-pushed the metal_stable_bindings branch from 7660797 to 1f183b1 Compare October 26, 2025 20:12

stuartcarnie commented Oct 26, 2025

View reviewed changes

stuartcarnie force-pushed the metal_stable_bindings branch from 1f183b1 to efb8003 Compare October 27, 2025 00:14

clayjohn reviewed Oct 27, 2025

View reviewed changes

servers/rendering/rendering_shader_container.h Show resolved Hide resolved

stuartcarnie mentioned this pull request Oct 27, 2025

Performance regression in 4.4 on Android after introducing batching (GPU bottleneck) #104194

Open

Metal: Stable argument buffers; GPU rendering crashes; visionOS exports

97c17ae

Supersedes godotengine#110683

stuartcarnie force-pushed the metal_stable_bindings branch from efb8003 to 97c17ae Compare October 27, 2025 21:45

clayjohn approved these changes Oct 28, 2025

View reviewed changes

clayjohn added bug and removed enhancement labels Oct 28, 2025

clayjohn modified the milestones: 4.x, 4.6 Oct 28, 2025

AThousandShips approved these changes Oct 28, 2025

View reviewed changes

Repiteo merged commit 8bae34a into godotengine:master Oct 28, 2025
20 checks passed

stuartcarnie deleted the metal_stable_bindings branch October 28, 2025 18:36

bruvzg mentioned this pull request Oct 30, 2025

[macOS/iOS] Fix build with Xcode older than 26. #112185

Open


		typedef LocalVector<ReflectUniform> ReflectDescriptorSet;

		struct ReflectShader {

		// A type used to encode resources directly to a MTLCommandEncoder
		struct DirectEncoder {

Uh oh!

Uh oh!

Metal: Stable argument buffers; GPU rendering crashes; visionOS exports #111976

Metal: Stable argument buffers; GPU rendering crashes; visionOS exports #111976

Uh oh!

Conversation

stuartcarnie commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

FIX: Rendering artefacts and GPU crashes

FIX: Unable to export and bake shaders for visionOS

FIX: Incorrect Metal Shader Language and OS feature targeting

FIX: Performance regression using UMA buffers

IMPROVEMENT: Reduce memory and CPU usage

FPS

GPU times

Memory improvements

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stuartcarnie commented Oct 24, 2025

Uh oh!

stuartcarnie commented Oct 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clayjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Repiteo commented Oct 28, 2025

Uh oh!

stuartcarnie commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stuartcarnie commented Oct 24, 2025 •

edited

Loading