5 Replies Latest reply on Mar 12, 2014 2:38 AM by amd_support

    Asynchronous DMA and Computation

    cjb80

      I haven't found much recent information on this subject on the forum.  I saw that in v2.7 that asynchronous DMA and kernel execution was supported so I am unsure how relevant the older (i.e., ~1 year old) posts are on this subject.

       

      To perform asynchronous read, write and execution do I need to have three command queues with APP v2.9 or can I do this with one (out-of-order command?) queue?

       

      Are out of order command queues supported with AMD GPUs at this point?

       

      Thanks,

       

      Chris

        • Re: Asynchronous DMA and Computation
          prao

          Hi Chris,

           

          Asynchronous DMA and computation can be achieved through separate command queues. You can check the APP SDK 2.9 sample AsyncDataTransfer that demonstrates this.

          Out-of-order command queues are not yet supported on AMD GPUs.

           

          Regards

          Pradeep

          1 of 1 people found this helpful
            • Re: Asynchronous DMA and Computation
              cjb80

              OK, just to be clear, is multiple command queues the only method to perform asynchronous transfers and computation then?

               

              Thanks,

               

              Chris

                • Re: Asynchronous DMA and Computation
                  nou

                  AMD implementation have support for concurrent execution. That mean if you execute three kernels on single queue and there are no dependency it can execute concurrently. So it is not strictly in order.

                    • Re: Asynchronous DMA and Computation
                      cjb80

                      It would seem then that AMD supports out of order queues...?

                       

                      Here is what I am currently doing:  I have one queue that I set to be out of order.  I then issue several reads and write commands and one kernel execution command.  I then wait for all to be completed using clFinish(). There is no data dependency between the reads, writes, and kernel execution. Based on what prao has said and comments in other forum posts, it would appear that these operations would happen serially. Is this correct?

                       

                      Thanks,

                       

                      Chris

                        • Re: Asynchronous DMA and Computation
                          amd_support

                          Hi Chris,

                           

                          1. As of now, AMD GPU does not support out-of-order queues. To make sure whether your device has support of out-of-order queues, check clinfo.

                          2. By default, all the clEnqueues commands (read/wrire/execution) are asynchronous with resp to host. But they execute serially on device.

                          3. To execute commands asynchronous on device, you must need at least 2 command queues. But result of overlapping of data transfer with device computation depends upon whether your device has support of at least 2 hardware command queues or not.

                          4. As per the 5.5.6 section of "AMD Accelerated Parallel Processing OpenCL Programming Guide-rev-2.7" book, for Southern Islands and later, devices support at least 2 hardware command queues.

                           

                          Regarding your last post, you are correct and It seems that these operations would execute serially.

                           

                          Thanks,

                          AMD_Support