It would be nice to have a deep-dive into how the memory controllers are designed on Polaris. Also, would be nice to provide
hints to compiler to place certain buffers on the same memory controller.
For my app, when I compress an image, I have a number of buffers assigned to that image, and having these buffers assigned to the same
controller (if that is what is going on) seems to give a huge performance boost ( around 100% faster performance)
Any advice on how buffer allocation order can affect performance?
I also read in AMD opencl best practices guide that there are two DMA engines on cards, and command queues are assigned to one or the other
engine based on when they are allocated : first queue goes to engine 1, second queue to engine 2, third queue to engine 1 etc.
I suppose the same logic applies to memory controllers on card ?