How does a heterogeneous system architecture help with increasing performance during deep learning applications?
I have this hopefully simple question stated in the title. Here is the context:
As of now, a lot of deep learning is done on NVIDIA GPUs due to the GPUs ability for parallelizing mathematical operations needed for these tasks AND having the proper software infrastructure (i.e.: CUDA/cuDNN) in place. AMD has for a long time talked about the advantages of a heterogeneous system architecture for general applications. For instance, this leads to partially eliminating the need to copy data from CPU-attached memory to GPU-attached memory. This is potentially especially efficient if APUs are used that contain integrated graphics cards.
I just bought a Ryzen CPU. Even if I had an AMD discrete GPU (which I don't), how would this approach actually help with performance increases for deep learning applications? In this particular case, it appears, that the need to copy data from the system to the gpu continues to persist.
How does RocM/MiOpen differ from CUDA/cuDNN in regards to the HSA approach?
Message was edited by: Anthony Le: Added that it's also the software that was added.