Skip to content

Conversation

@alexgu754
Copy link

Description

Added SDL_WaitCondition in METAL_WaitForSwapchain and AcquireSwapchainTexture to wait for a callback from CVDisplayLink and to wait for presentation of previous frames. A conventional game while loop like while(!quit) { sample_input(); present(); } quickly overflows CAMetalLayer* with four frames, and then there's a four frame queue to everything.
The difference is especially stark in imgui, please try the imgui demo with this patch), but remember to call SDL_WaitForGPUSwapchain or SDL_WaitAndAcquireGPUSwapchainTexture before sampling input

Existing Issue(s)

issue #14456
#14456

@slouken slouken added this to the 3.6.0 milestone Nov 27, 2025
@thatcosmonaut
Copy link
Collaborator

thatcosmonaut commented Nov 27, 2025

This isn't correct behavior. You don't always want to wait for presentation. Sometimes you actually want queued frames to maximize GPU utilization to improve frame rate at the cost of latency.

We already have SDL_SetGPUAllowedFramesInFlight and SDL_WaitAndAcquireGPUSwapchainTexture. When allowed frames are set to 1, and wait and acquire is called, the implementation waits until the presentation command buffer is finished. Under that condition do you observe latency? If so, and the DisplayLink usage improves that behavior, is it possible to integrate this with the allowed frames in flight setting?

@alexgu754
Copy link
Author

alexgu754 commented Nov 27, 2025

no, setting allowed frames to one does not solve the issue. [commandBuffer addCompletedHandler:] is called when all GPU work encoded into that command buffer has finished executing on the GPU (i.e. instantly), not when they're presented. so the current implementation actually does not enforce a frame limit of any kind. so I guess yes, it could be a variable, I just found 2 to be 'perfect' mouse responsiveness

Screen.Recording.2025-11-27.at.21.58.18.mov

I'll fix the compile errors tomorrow with work arounds for older mac versions, and CVDisplayLink currently forces vsync, forgot to add the if statement to check if vsync was turned off.

@alexgu754
Copy link
Author

alexgu754 commented Nov 28, 2025

also currently it busy loops waiting for the command buffer to complete which is absolutely ridiculous
zxcvzcxv

@alexgu754
Copy link
Author

alexgu754 commented Nov 28, 2025

I have a theory that apple decoupled presentation from command buffer completion after version 10.15 after they added this api, because in older metal sample projects and in many github repos today there is a pattern of an '_inflightSemaphore' that counts commandBuffer completion and waits on it. but that doesn't have this meaning now if it ever did, its the worst of both worlds, you're forcing the gpu buffer to complete with this busy loop, but you're still overflowing the swapchain. its latency without the performance. someone should test this

added @available guards to allow it to compile on older versions, made use of renderer->allowedFramesInFlight instead of making it a constant
oops, accidentally left in allowedFramesInFlight = 1
made display link toggleable with SDL.gpu.device.create.metal.use_display_link props id in device creation
left in preliminary code accidentally
@alexgu754
Copy link
Author

trying to keep the pr minimal for now and only relating to macos. the busy loop wait on SDL_GPUFence really has to be replaced with a semaphore, and allowedFramesInFlight should be enforced on other platforms like iphone, because right now it does literally nothing

need to wait for addPresentedHandler calls in cleanup, since they still fire even after SDL_Metal_DestroyView is called
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants