I need to pass few arrays of size <~128kB to GPU.
Cause CPU should only write these arrays and not read them, I think that 2 ways possible
1) create buffers on GPU with AMD_PERSISTENT flag. Map them to host, write directy to them, unmap, use on GPU until next cycle.
2) create buffer in GPU memory, create buffer in host pinned memory (ALLOC_HOST_PTR flag), map pinned buffer, write to pinned buffer, then use WriteBuffer to transfer data from pinned memory buffer to GPU memory. It's almost impossible to overlap kernel execution with memory transfer for now in that particular place of my app so this advantage of DMA most probably will be missed anyway in second case.
So the question is: if overlap withg kernel execution not possible for both ways, what way will provide fastest data transfer with smallect overall overhead ?