Asynchronous DMA  + Kernel Execution using AMD GPUs

Hi all,

We have recently worked on this code to showcase, asynchronous DMA + Kernel Execution on AMD GPUs. Please go through it, give feedback. We hope it helps a lot developers and students to achieve better performance using AMD's hardware.

Courtesy: Andryeyev German


