Hi!
part1: You can use SSE instructions to copy on the CPU side, thats much faster, but anyways there is a better way: Pinned memory.
part2: memory allocated with ResAllocLocal resides on GPU ram, and ResAllocRemote on the system ram. If CPU wants to use Local memory, the it will be transfered through PCIE bus first. Same thing goes between GPU and Remote (CPUside) memory.
There is a third kind of allocation, I guess that's the best: It's pinned memory:
- Allocated with calResCreate1D or 2D (it's an extension)
- You have to provide the System memory for it (watch out for 4K alignment!!!)
- No need to use calResMap. This memory is synchronised transparently between GPU and CPU. (I guess thr amd driver is actively cooperating with the OS's memory paging stuff)
- And suprisingly this is the fastest way of memory transfer I experienced with.
part3: You can copy memory from Local to Loca, while on the cpu you do whatever you want. Also when copying between Local and Remote, it's done assynchronously with the DMA controller. Also you can queue memoryTransfer and runProgram events and.
Hope it helps.