-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41
Comments
Thank you for your feedback and suggestions for app improvements, Kurt! |
Sure im happy to submit fixes for 1,2 which are are correct. But 3,4 seem like they require more understanding of the pipeline that I don't have (why is dbp_full() called when it is, why are states not set at that time, or what is the synchronization supposed to be between present and decode in the non-fifo case). The fixes of disabling non-fifo modes and reducing dpb size to 15 are definitely not correct at least. |
Hi Kurt, what is the MESA driver that you are using - what version/distro and HW (Intel, AMD)? About 4.
How many DPB slots are allocated for your video content? Also, if the decode queue keeps running and the display does not consume the frames, the decoder will eventually run out of images. |
ANV_VIDEO_DECODE=1 env var can be used to enable the vulkan video decode queues on intel's anv driver. This is with the as of yet unreleased 23.2, testing was done on master branch from about a week back. The rest of the system was standard arch linux.
16, its H264 content.
I don't think the synchronization is a fault for this particular issue, since it occurs without presentation running (and presentation is on a 120hz display versus 30hz content). The dpb_full() check is used to reclaim slots during |
Hi Kurt, |
This is the reproducing encoding: https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/1080/Big_Buck_Bunny_1080_10s_1MB.mp4 Sorry, but on master it seems the code still crashes:
With |
Sorry, I've missed one YUV conversion mapping location. It is fixed now. |
I don't have a setup to try the Intel's anv MESA driver. I'll make an enquiry to Intel to see if I can find somebody who can look at these two issue. |
@zlatinski I think the image count calcs are wrong. caps.maxImageCount can legally be 0, so you shouldn't limit things to max if it's 0. |
I also think the mailbox present might be wrong, but I'm not sure how best to fix it, I'm a bit rusty here. It looks like you acquire all the images here in mailbox mode, which is fine, since only one image will be waiting for the display, but the code has no throttling. I'd suggest just using FIFO for vsync mode for now |
OK, thanks, Dave. Fixed that. |
The code actually waits on the display images to be flipped in Shell::AcquireBackBuffer() after obtaining the next image.
|
you are relying on acquireNextImage blocking though once you've allocated 3 images but not presented any of them I'm not sure what the exact semantics of MAILBOX are here, but we aren't blocking on acquireNextImage at all, so you acquires a bunch of images before ever getting a chance to present them, so run out and hit the assert |
Oho, Now I see where the issue is. Thanks, @airlied! I can add code to monitor the queue level on the display swap side and wait on the way in when it becomes close to empty. The alternative would be to increase the number of swapchain images, but I don't think we want to do that for resource usage and presentation latency reasons. |
I tried this with my
The copy command was recorded inside a pool created with this queue family,
Note it doesn't advertise transfer support (I thought this might have been an implicit requirement of the decode queue, but was missing it in the spec). Anyway, the anv driver is expecting compute capability on a queue that is not graphics. Should the sample app check for queue requirements better here, and maybe use a separate queue for copies on this hardware? |
+1 on fixing all the validation errors. There's no way a v1.0 shipped version of anything should have any vulkan warnings, let alone errors. I've fixed a few myself (missing pnext struct pointer in one case), but I actually have to disable the VK validation layer when I enable Vulkan Video decoding in my game engine to make sense of anything. I still have a way more fundamental problem, namely that I'm using HLSL (via DXC -> SPV compilation) for all my ray tracing shaders and I still can't get the immutable 420 / ycbcr decoding sampler to actually decode it to RGB inline. All I see is red = one channel. Instead I have to use a single GLSL fullscreen shader to decode it to an extra copy surface then use its texture from my RTX (HLSL-based) shaders. That extra copy is irritating and unfortunate, although OTOH computing the mips are actually useful (I'm using vulkan video to decode 120 FPS HDR torch flame videos from Embergen, for my RPG and using those to light up the scene with full GI / multiple bounces etc). I'd love to know if anyone else out there has managed to compile an HLSL shader that can decode 420 video properly from Vulkan Video sources. Every one I've come across, including this sample, is GLSL -> glsllangvalidator.exe ----> SPV pipeline. |
Since I'm stuck on something else related to the samples app, I made a start on this here: #48 WIP. |
Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to nvpro-samples#41. Signed-off-by: Benjamin Cheng <[email protected]>
I can also reproduce |
Fixes for the display swapchain MAILBOX issue from someone who has the HW setup are welcome. |
Can you please try the display again at 1d9eb74? |
Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to nvpro-samples#41. Signed-off-by: Benjamin Cheng <[email protected]>
Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to nvpro-samples#41. Signed-off-by: Benjamin Cheng <[email protected]>
Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to #41. Signed-off-by: Benjamin Cheng <[email protected]>
Intel hardware has some initial support for vulkan video decoding but this application appears to fail in numerous ways when attempting to run it on linux+mesa. Most of these appear to be bugs in the sample and not
Trying to get the sample working it seems the fixes/workarounds required were
MapMemory
when usinggenerateColorPatternRgba888
invk_video_decoder/libs/VkCodecUtils/VulkanVideoUtils.cpp
ImageObject::FillImageWithPattern
, becausevkFillYuv.fillVkImage
also attempts map. Mapping twice is invalid by spec and fails on mesa.Shell::AcquireBackBuffer
whereassert(acquireBuf != nullptr);
attempts to check if the backbuffer queue was empty. So you get more reasonable crashes.vk_video_decoder/libs/VkShell/Shell.cpp
Shell::ResizeSwapchain
, the code appears to depend on AcquireNextImage blocking so if other modes are present it will overrun the backbuffer queue and hang or crash. (there is avsync
option in the config but this doesnt appear to affect decode presentation).vk_video_decoder/libs/NvVideoParser/include/VulkanH264Decoder.h
dpb_full()
since the newly allocated reference frame hasstate == 0
it is not counted here despite being counted in the dpb's accounting. The next decode step will then assert that the DPB is out of slots and crash when it attempts to allocate another reference frame.That should cover everything I needed to get the sample decoder running successfully. It should be noted that incorrect handling of ycbcr sampler attachment, synchronization of cmdbufs, image ownership transitions (due to separate video and present queues), and more result in a large amount of validator noise but the frames presented appear more or less correct.
Thanks for writing up the sample as well it was convenient to have something to test the new functionality with.
The text was updated successfully, but these errors were encountered: