AnsweredAssumed Answered

GPU driver hang

Question asked by dutta on Jul 21, 2019
Latest reply on Jul 30, 2019 by dutta

I have a Vulkan issue that only seems to occur on my GPU. I have thus tested it on an R9 Nano, an Nvidia 1060, and it works fine. Validation layers only produce errors related to texture layouts which worked previously. This is my issue:

 

Running in an optimized build, the GPU hangs and fails to recover after a couple of seconds of running. Each frame is identical to the other, nothing in the scene changes. Once the GPU recovered but both screens turned purple. The time it takes before the hang is different, but I never manage to run it for over a minute. Trying to synchronize each frame by inserting and immediately waiting for a fence on all queues does impact the performance, but does not remove the hang. In very few cases, the GPU did manage to recover and showed that vkAcquireNextImageKHR returned VK_ERROR_DEVICE_LOST, but I could not find any prior command returning this error. I also noticed that when a fence was waited for but wasn't submitted, vkWaitForFences returned VK_TIMEOUT, which does not seem to correspond to the specification, but this issue is probably unrelated. I am sure I am doing something wrong, but without any recovery from the GPU, and without the validation layers telling me anything interesting, it is impossible to understand where to look for.

 

Running in a debug build however, produces no such issues. All submissions are done in a single thread, and I can't see how I can be racing against the GPU since I am in fact waiting for the GPU to finish after each submit and present. My hardware is a RX 480, and the driver version is 19.7.2, running in Windows 10 build 1903. The issue is hard to extract to a minimal repro, but the code can be found at https://github.com/gscept/nebula/tree/master/code/render/coregraphics/vk, the files interesting should be vkgraphicsdevice.cc and vksubcontexthandler.cc. 

Outcomes