This is kind of related to an old discussion with PCSX2 having poor OGL performance which was rudely archived with no resolution as seen here https://community.amd.com/t5/opengl-vulkan/dreadful-opengl-performance/td-p/212252/
However recently one of our guys has been working on a Vulkan renderer and has been investigating poor performance on AMD when using "pipeline barrier from COLOR_ATTACHMENT -> FRAGMENT SHADER, but the memory flags are COLOR_ATTACHMENT -> INPUT_ATTACHMENTT, which is framebuffer-local (VK_DEPENDENCY_BY_REGION_BIT)" as he is doing what is described here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#renderpass-feedbackloo...
What he noticed was despite what settings he was passing, it was doing full barrier and invalidating everything, which probably causes a write back of everything to VRAM, and this is VERY slow, Nvidia doesn't suffer from this same problem.
Now I don't know that OpenGL is doing the same thing, but performance is even worse there when using barriers, so I wouldn't be surprised if it was the same.
The code for the barrier is as follows
static void ColorBufferBarrier(GSTexture* rt)
{
const VkImageMemoryBarrier barrier = {VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, nullptr, VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, VK_ACCESS_INPUT_ATTACHMENT_READ_BIT, VK_IMAGE_LAYOUT_GENERAL, VK_IMAGE_LAYOUT_GENERAL, VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED, static_cast<GSTextureVK*>(rt)->GetTexture().GetImage(), {VK_IMAGE_ASPECT_COLOR_BIT, 0u, 1u, 0u, 1u}};
vkCmdPipelineBarrier(g_vulkan_context->GetCurrentCommandBuffer(), VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_DEPENDENCY_BY_REGION_BIT, 0, nullptr, 0, nullptr, 1, &barrier);
}
No response, AMD? an acknowledgement would be nice, I figured 4 months might be enough for at least a "thanks for the report".
Thanks.
Hi @refractionpcsx2 ,
Sorry for this delayed response. I have informed the Vulkan team about this issue. I will let you know once I get any feedback from them.
Thanks.
Thank you, I appreciate it 🙂
Hi,
I've opened up a ticket to track this issue to get someone to look at this. Thanks for reporting.
Owen
Hi refractionpcsx2,
Would it be possible to provide an executable with source code to reproduce this issue so it's easier on our end?
Thanks,
Owen
Testcase:
https://drive.google.com/file/d/1IRvZaVo55ljhh4PPoUPhEfs762psrleV/view?usp=share_link
Source code:
https://github.com/PCSX2/pcsx2
To run the testcase simply drag the DRAGME.gs.xz file on top of the executable pcsx2-qtx64.exe
Hey,
Sorry for the delay, investigation is ongoing, it looks like VK_DEPENDENCY_BY_REGION_BIT won't do anything on dGPUs. It seems to be mostly used on mobile GPUs for tile-based renderers. We're wondering if PCSX2 is still using this method? And what's your environment that showed this issue? A vulkaninfo would be useful.
Thanks,
Owen
Hi there!
So sorry for the delay in getting back to you, I must had missed the email, we just happened to be talking about this on our discord today.
We do indeed still use this bit, and as far as we're aware both intel and Nvidia benefit from this bit being set, not just tile based iGPU's, it was one of our main reasons that Vulkan was faster Vs OpenGL
This is a comparison on an Nvidia 2070 Super with Need for Speed Carbon (which does a lot of barriers)
OGL:
https://media.discordapp.net/attachments/612095738712817665/1200441378476408942/image.png
VK:
https://media.discordapp.net/attachments/612095738712817665/1200441390258212884/image.png
If you would like I can provide this sample in a package, it's actually what we call a GS dump, which is a recording of the instructions sent to the PS2's GPU, that way you can run it on your end with a preconfigured PCSX2 (no BIOS or game required, so no copyrighted materials), there might be a few draws in there though :D. We also have a "debug device" option in the emulator for annotating the draws in tools such as Renderdoc 😄