4 Replies Latest reply on Sep 7, 2016 12:04 PM by invertedantimatter

    Transfer queue copy operations


      I'm attempting to copy staged data from a VkBuffer or VkImage bound in host-visible memory into a VkImage in device-local memory.

      vkCmdCopyBuffer appears to work with any queue family.

      However, I've only been able to get vkCmdCopyBufferToImage, and vkCmdCopyImage to work in a queue family with graphics enabled.

      The spec reads:

      The VkCommandPool that commandBuffer was allocated from must support transfer, graphics, or compute operations.


      VK_PIPELINE_STAGE_TRANSFER_BIT: Execution of copy commands. This includes the operations resulting from all transfer commands. The set of transfer commands comprises vkCmdCopyBuffer, vkCmdCopyImage, vkCmdBlitImage, vkCmdCopyBufferToImage, vkCmdCopyImageToBuffer, vkCmdUpdateBuffer, vkCmdFillBuffer, vkCmdClearColorImage, vkCmdClearDepthStencilImage, vkCmdResolveImage, and vkCmdCopyQueryPoolResults.

      Which seems to me that transfer queue families should support it.

      The obvious workaround is to use vkCmdCopyBuffer (host to GPU copy) using the transfer queue family and vkCmdCopyBufferToImage (local GPU memory copy) using the graphics queue family to avoid stalling the graphics queue since local memory copies are faster than PCI-e transfers.

      Is this expected behavior?

      • Hardware: AMD GPU (R9 Nano)
      • Requested API: 1.0.13
      • Radeon driver: 16.4.2
      • Windows 10
        • Re: Transfer queue copy operations

          Thank you for your report. This does not seem to be a legitimate driver behavior. Would you mind providing an example application which reproduces this issue? Modifying SDK's cube application to do the job would be just fine.

            • Re: Transfer queue copy operations

              Attached is a simple application which displays a 24-bit 512x512 BMP file using 1 of 4 methods using the variables enable_transfer_queue and use_buffer_copy:


              enable_transfer_queue makes use of the first available queue family for transfer operations only.

              use_buffer_copy adds a intermediate buffer to move data from host memory to device memory.


              Possible transfer configurations:

              * Buffer (host-visible) ===> Image (device-local)

              * Buffer (host-visible) ===> Buffer (device-local) ===> Image (device-local)


              Because the minimum image transfer granularity on my transfer queue family is (8, 8, 8), I transfer a 3D image with a depth of 8 and simply blit the 1st layer to the swap chain.


              The problem is that enable_transfer_queue=true and use_buffer_copy=false does not work.

                • Re: Transfer queue copy operations

                  OK, so this one is an issue we recently fixed. I'm not sure if the necessary changes have already gone out, but what I certainly can tell is that the driver you are using is outdated :-) Please consider checking if you can reproduce the problem with the latest driver version (APU)

                    • Re: Transfer queue copy operations

                      I get the same behavior.


                      I've been using [Radeon Software Crimson Edition 16.7.3 Driver Version 16.30.2311] for the past week to no avail.


                      I upgraded to [Radeon Software Crimson Edition 16.8.3 Driver Version 16.30.2511] today to re-test the issue.


                      Driver properties:

                      Vulkan Device Properties:

                      amdvlk64.dll properties:

                      * MD5: be12504d74301bcc97f5ebeec17ce6b9