Attached are two SPIR-V modules (I have more which reproduce this, in case it becomes necessary), that when used to create a graphics pipeline cause vkCreateGraphicsPipelines to fail with a generic "VK_ERROR_OUT_OF_HOST_MEMORY".
These modules succeed validation via spirv-val and work with non-AMD hardware and drivers. On AMD however the aforementioned issue occurs.
Also interesting is that while this was discovered on Windows, I have users reporting that it works correctly with the open source amdvlk driver on Linux, but fails there too with the proprietary amdgpu-pro one.
I've attached a simple reproducer application which loads one of the modules and attempts to create a pipeline. On AMD it throws a "failed to create graphics pipeline!" exception. (repro_1.exe is built to load the repro_1.spv module, and repro_2.exe the repro_2.spv one, source code is in main.cpp and just a slightly modified copy of the vulkan-tutorial.com code for reproduction purposes)
Hi crosire, I can't reproduce error on nVidia RTX 2060, I will check later on AMD RX 560.
Please attention on Vulkan Validation layers during call vkCmdDraw:
validation layer: vkCmdDraw(): VkPipeline 0xc6ac6a000000000f defined with VkPipelineLayout 0xb5f68b000000000e is not compatible for maximum set statically used 1 with bound descriptor sets, last bound with VkPipelineLayout 0x0 The Vulkan spec states: For each set n that is statically used by the VkPipeline bound to the pipeline bind point used by this command, a descriptor set must have been bound to n at the same pipeline bind point, with a VkPipelineLayout that is compatible for set n, with the VkPipelineLayout used to create the current VkPipeline, as described in Pipeline Layout Compatibility (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdDraw-None-02697)
validation layer: VkPipeline 0xc6ac6a000000000f uses set #0 but that set is not bound.
validation layer: VkPipeline 0xc6ac6a000000000f uses set #1 but that set is not bound.
As noted in the description, the attached reproducer is only there to reproduce pipeline creation. It is not expected to work past that. I only hacked it together to reproduce the actual bug, so that there is something to debug against, since the actual application (ReShade) this occurs in is not really suitable for that purpose. And ReShade does not throw any validation errors or warnings. So this is not about those errors. It is about the driver failing pipeline creation with the attached SPIR-V modules before it gets anywhere.
In ReShade I was able to work around this by modifying the SPIR-V modules to only contain a single entry point at runtime (I have a function which parses SPIR-V and spits out new separate SPIR-V modules for every entry point). That works and the driver swallows it. Using the full module does not, but it works on non-AMD drivers. And since the SPIR-V spec allows multiple entry points in a SPIR-V module, I'm pretty confident that this is a driver bug. I would prefer not having to do the workaround. Code for that is here, in case it helps: reshade/driver_bugs.hpp at bad7be47ab6357ffb9374f848c195efa17025236 · crosire/reshade · GitHub
crosire, I can reproduce error on AMD RX 560 driver: 26.20.15019.15019 (Adrenalin-2020-Edition-20.2.2-Feb28)
Also I see no Vulkan Validation Layers's errors, so should assume creation pipeline code has no errors.
May be try to create SPIR-V file with another parameters, for example choice another SPIR-V Version, siwtch on/off optimizations.
Did you try to run this test on Intel GPU ?
The error does not reproduce on both NVIDIA and Intel in my tests. Also note that as mentioned in the original post, it apparently does not reproduce with the amdvlk driver as well (I did not test this myself). Only the amdgpu-pro and Windows driver have this issue. It works perfectly everwhere else. I don't know what else I can do to demonstrate this very clearly being a driver bug.
Especially since the exact same SPIR-V works just fine if all but one entry point are removed (but the rest of the module is untouched, as noted before). Yet the SPIR-V spec clearly states the multiple entry-points are valid: SPIR-V Specification ("Module: A single unit of SPIR-V. It can contain multiple entry points, but only one set of capabilities").
I cannot change the SPIR-V generation parameters much. These are generated with a custom compiler (ReShade FXC), not glslang. But it's easy to verify that this compiler and glslang otherwise produce similar results. Unfortunately there is no other SPIR-V compiler that I know of out there that supports outputting multiple entry points, which I suspect is the reason this driver bug occures in the first place (since this was simply never tested before).
thanks for looking into it!
The SPIR-V does not define a function multiple times though. repro_1.spv for instances defines the following function IDs, not of which occurs more than once: %8, %49, %96, %122, %156, %183, %235, %243, %391, %535, %713, %892, %926, %943, %961, %1031, %1102, %1148, %1462, %1509, %1567, %1619, %1672, %1725, %1766, %1909, %2045, %2304, %2459, %2481, %2506, %2542, %2561, %2583, %2602, %2610, %2675, %2690, %2723, %2744, %2756, %2776, %2790, %2812, %2828, %2847
So the only way I can explain that error would be that the compiler complains about the fact that there are multiple "OpEntryPoint" instructions for an execution model. They point to different functions. E.g. in case of repro_1.spv:
OpEntryPoint Vertex %2723 "F__PostProcessVS" %gl_VertexIndex %gl_Position %2734
OpEntryPoint Fragment %2744 "F__SMAADepthLinearizationPS" %gl_FragCoord %2751 %2755
OpEntryPoint Vertex %2756 "F__SMAAEdgeDetectionWrapVS" %gl_VertexIndex_0 %gl_Position_0 %2764 %2766
OpEntryPoint Fragment %2776 "F__SMAAEdgeDetectionWrapPS" %gl_FragCoord_0 %2782 %2785 %2789
OpEntryPoint Vertex %2790 "F__SMAABlendingWeightCalculationWrapVS" %gl_VertexIndex_1 %gl_Position_1 %2798 %2800 %2802
OpEntryPoint Fragment %2812 "F__SMAABlendingWeightCalculationWrapPS" %gl_FragCoord_1 %2818 %2821 %2824 %2827
OpEntryPoint Vertex %2828 "F__SMAANeighborhoodBlendingWrapVS" %gl_VertexIndex_2 %gl_Position_2 %2836 %2838
OpEntryPoint Fragment %2847 "F__SMAANeighborhoodBlendingWrapPS" %gl_FragCoord_2 %2853 %2856 %2859
But this should work: The SPIR-V spec only notes that each entry point is limitied to a single execution model, it does not forbid multiple entry points to be defined for an execution model (SPIR-V Specification). To me this still sounds like the compiler is not handling this case correctly.