![VulkanMod](https://media.forgecdn.net/avatars/thumbnails/561/294/256/256/637913373178716740.png)
[RADV] GPU resets when vsync is off
Etaash-mathamsetty opened this issue ยท 8 comments
running the game with RADV_DEBUG=hang
, but the vulkan code is extra good and still causes GPU resets on my ryzen 7 4700u + iGPU. Not sure how debug this since the system becomes unusable when the GPU resets.
Frame ordering is also still incorrect with VSYNC disabled
I can reproduce it using a RX580 with Mesa 22.3.2 (git-a09d5e2747). I have two logs but they point to two different errors:
latest.log
2023-01-31-1.log
Here is vulkaninfo: vulkaninfo.txt
I know nothing about the Linux or the RADV drivers, however I can think of two possible theories that may be related
-
Their is a very similar issue currently open on the RADV repo which suggests that QueuePresent on RADV may in fact be broken
- One of the logs also crashes with VK_ERROR_SURFACE_LOST_KHR, which suggests problems with the SwapChain/Surface
- RADV reports maxImageCount and minImageCount being also 0 and 3 respectively, which dosen't seem to make any sense and may also be breaking the SwapChain/present Code in the Mod
- My GTX 1080 Ti on Windows 10 has Max and Min of 8 and 2 Images, which seems to work fine mostly (FullScreen V-Sync is broken which can be fixed by using a separate Transfer Queue)
-
The mod by default uses a seperate Queue to present frames, which AFAIK isn't a common setup anymore (most just use the Same Graphics Queue for Present), which I suspect may also be effecting Swapchain/Present behaviour:
- What might be worth trying is using either RADV_DEBUG=nocompute or nofastclears or syncshaders to effect present behaviour and/or test disabling the 2nd queue to check if Presenting on the seperate queue is effecting the crash in anyway
- The mod isn't using Compute features/Separate DMA transfer Queue for Async Compute/Transfers so this shouldn't cause any issues
As mentioned I will likely be unhelpful/mostly useless with this as I've very little experience with AMD GPUs and Linux in general, so apologies if this does absolutely nothing/even makes the issue worse.
RADV reports maxImageCount and minImageCount being also 0 and 3 respectively, which dosen't seem to make any sense and may also be breaking the SwapChain/present Code in the Mod
the first part is not true, but the second part is, the code doesn't respect the number of swap chain images actually created and just assumes that the min image count passed in is the image count outputted, which originally was the main reason for the crashes, since my GPU is resetting with the new versions of vulkan mod, I really can't debug/test further, nor do I really want to, I would rather just rewrite this whole mod to respect the spec more. (or even use zink, which currently works better and has better performance :/, and has shader compatibilty)
Apologies if I was mistaken with imageCount, I wasn't 100% sure if maxCount of 0 was intended or a Driver Bug. I got that information from the above VulkanInfo file linked above and from GPUInfo reports also using RADV with a RX 580, RX 6950XT, and RX 6800
(The MaxImageCount of 0 occurs in multiple machines with RADV and not just here)
If this issue doesn't occur on other Vulkan App/Utilities (e.g. vkCube.exe), that strengthens the theory that the Swapchain handling the mod uses may not be ideal.
The swapchain and mod code is based on a Java Demo which only specifies the minImage Count if Max is greater than 0 which isn't the case in RADV as you mentioned, Which causes the minCount to be specified as 0 and not 3.
I Also tried to force the minImage value to zero, however at least on Windows and Nvidia I couldn't reproduce the issue as Nvidia seem to add Idiot proofing to their drivers to override certain illegal values. Also can't test this properly as I do not have an AMD card to test with RADV drivers on Linux.
Unfortunately I never figured out how to handle Presenting properly on Vulkan as many people seem to mess it up, so I'm not sure if the mod handles it correctly or otherwise. (I use a different setup for swapchain images but that's in C++ with much simpler code than what this mod uses)
AFAIK there is no concept of Swapchain order as vkAcquireNextImageKHR is not guaranteed to give a deterministic order for frames as that seems to be up to the OS/GPU Driver to handle.
vkAcquireNextImageKHR is also technically not blocked by vkDeviceWaitIdle AFAIK so it may be a very extreme edge case specific to only RADV and on certain Distros.
I still think disabling the Compute Queue on RADV may still be worth investigating to force VulkanMod to use the same queue for Graphics and present to rule out/confirm if its an odd queue synchronisation issue.
My theory for this derives from a V-Sync stutter issue with Nvidia in fullscreen which was fixed by using a separate Transfer Queue for uploads, however my hardware is old and I use an outdated Nvidia driver, so it is likely unrelated to this particular bug.
The only other method I can think of is removing the infinite -1 timeout on vkAcquireNextImageKHR, (which limits the max frames in flight to 1 AFAIK), and replacing it with a small integer (e.g. 10000 ms) to disable the blocking behaviour.
However IIRC xCollateral mentioned that removing the -1 timeout didn't work well with their system so it likely won't help here much if at all either.
Unfortunately, I didn't see this issue until I already posted a duplicate with much more debug information. #213