I'm working on my paper about processing signals on GPU. From my pre-research I've found out that when it comes to processing input vector/matrix, which will come at about 100MB/s it becomes unprofitable to execute calculations on GPU, though algorithm is well paralleled. The problem is bandwidth of PCI-E interface and need to copy data from CPU memory to GPU memory. As far as I understand when I will be able to build my system around AMD's APU with HSA architecture I should be able to omit this bottleneck and I should be able to 'get back' into 10x CPU performance of my application.
Could you please tell me how well this thing will work? If it works that simple I guess it will be much faster to execute this program in Kaveri APU than on high-end R9 GPU, am I correct?
Thanks for all replies.